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USE OF COMPUTATIONALLY DERIVED PROTEIN STRUCTURES OF 
GENETIC POLYMORPHISMS IN PHARMACOGENOMICS AND CLINICAL 

APPLICATIONS 

RELATED APPLICATIONS 

5 Benefit of priority to the following applications is claimed herein: 

U.S. application Serial No. 09/438,566 to Kalyanaraman Ramnarayan, 
Edward T. Maggio and P. Patrick Hess, filed November 10, 1999 entitled 
"USE OF COMPUTATIONALLY DERIVED PROTEIN STRUCTURES OF 
GENETIC POLYMORPHISMS IN PHARMACOGENOMICS FOR DRUG 

10 DESIGN AND CLINICAL APPLICATIONS"; and U.S. application Serial No. 
(Attorney Dkt. No. 24737-1 906B) to Kalyanaraman Ramnarayan, Edward 
T. Maggio and P. Patrick Hess, filed November 1, 2000, entitled "USE OF 
COMPUTATIONALLY DERIVED PROTEIN STRUCTURES OF GENETIC 
POLYMORPHISMS IN PHARMACOGENOMICS FOR DRUG DESIGN AND 

15 CLINICAL APPLICATIONS." 

Where permitted the above-noted applications are incorporated by 
reference in their entirety. Also incorporated by reference in its entiretly 
is U.S. application Serial No. (attorney docket no. 24737-1 906C), filed 
November 10, 2000, to entitled "USE OF COMPUTATIONALLY 

20 DERIVED PROTEIN STRUCTURES OF GENETIC POLYMORPHISMS IN 
PHARMACOGENOMICS AND CLINICAL APPLICATIONS." 

Incorporation by reference of Tables provided on Compact Disks 
For US purposes and where permitted, an electronic version on 
compact disk (CD) ROM of Tables 4 and 5, which set forth coordinates 

25 for three-dimensional structures of proteins in the database described 
herein is filed herewith, and, where permitted and for US purposes, the 
contents thereof is incorporated by reference in its entirety. Table 4 is 
the HIV reverse transcriptase coordinates, and Table 5 is the HIV protease 
coordinates. The files that contain Table 4 are entitled 1906TAB.PC1 and 

30 1906TAB.PC2, created on November 10, 2000, and are 59,538 kilobytes 
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and 304 kilobytes, respectively, and the file that contains Table 5 is 
entitled 1906TAB.PC3, created on November 10, 2000, and contains 
11,413 kilobytes. 
FIELD OF THE INVENTION 
5 The present invention is related to computer-based methods and 

relational databases that use three-dimensional (3-D) protein structural 
models derived from genetic polymorphisms in the areas of computer- 
assisted drug design and the prediction of clinical responses in patients. 
BACKGROUND OF THE INVENTION 

10 Recent advances in molecular biology, such as the discovery and 

identification of large numbers of genes and the sequences thereof 
encoded in the genomes of humans, other mammals and infectious 
disease agents, have contributed to the identification of a large number of 
proteins, biological receptors and other macromolecules and complexes 

15 that are promising therapeutic targets. Based on the information derived 
from the gene sequences, the three-dimensional (3-D) molecular 
structures of the corresponding target proteins or receptors can be 
determined. 

Since 3-D protein structure is related to biological function, 
20 structure-based drug design is an increasingly useful methodology that 
has made a great impact in the design of biologically active lead 
compounds. Drug designers can design and screen potential new drugs 
via computational methods, such as docking or binding studies, before 
actually beginning patient testing. These experiments can be performed 
25 in silico at a tiny fraction of the clinical cost. 

The resulting molecules, while serving as lead compounds, often 
have unpredictable effects when employed in clinical trials. In addition, it 
has been observed that existing drugs with known clinical efficacy far 
often fail to achieve beneficial results when given to particular patients, or 
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particular subpopulations, such as ethnic groups, of patients. Genetic 
stratification of a population can be the difference between drug failure 
and drug approval. Hence there is a need to develop methods to improve 
the drug discovery process. Therefore, it is an object herein to 
5 provide, among a variety of benefits, methods and products that address 
and solve these problems. In particular, it is an object herein to provide 
computationally-based methods for drug design, clinical testing protocols, 
identification of new drug candidates and drug therapies; for predicting 
drug sensitivity and resistance and other methods. 

10 SUMMARY OF THE INVENTION 

Provided herein are computer-based methods for generating and 
using three-dimensional (3-D) structural models of target biomolecules, 
particularly polymorphic and allelic variants. Also provided herein are 
databases that contain the sequences of such variants and also the 3-D 

15 structure of the variants for use with the methods. 

Genetic polymorphisms arise, for example, as a result of gene 
sequence differences or as a result of post-translational modifications*, 
including glycosylation. Hence genetic polymorphisms are manifested as 
gene products and proteins having variant structures. The variant 

20 structures result in differences in biological responses among the 

originating organisms. These differences in response, include, but are not 
limited to, differences among patient responses to a particular drug, 
effective dosage differences, and side effects. With respect to infectious 
organisms, some polymorphisms may arise that convey resistance or 

25 susceptibility to particular drug therapies by the altering the drug target 
structure. 

Structural changes that arise as a result of genetic polymorphisms 
are not of unlimited variety, since 3-D structure impacts upon function. A 
knowledge of the repertoire of the fine differences among generally similar 
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3-D structures of particular proteins will permit design of drugs that bind 
to the most polymorphisms, drugs that induce the fewest side-effects, 
and drugs that are more effective against infectious agents. Knowledge 
of these structures ultimately will permit patient-specific or subpopulation- 
5 specific, such as ethic, age, or gender groups, design or selection of 
drugs. 

The methods that are provided are for determining and using 3- 
dimensional (3-D) protein structures that are derived from genetic 
polymorphisms to understand differences in biological activity that result 

10 from the polymorphisms, and to use this understanding to aid in the 

identification of potential new drug candidates and drug therapies. Also 
provided are methods for analyzing 3-D structures of protein structural 
variant targets derived from genetic polymorphisms to identify common 
structural features among the variants; methods for identifying structural 

15 changes in target proteins that are associated with multiple mutations 
arising from genetic polymorphisms and correlating this information with 
biological activity; methods for using clinical data in conjunction with 
structural variants derived from genetic polymorphisms to understand and 
predict the pharmacological effects and clinical outcomes for drugs or 

20 potential drugs. Also provided are methods for generating 3-D protein 
structures derived from a given genotype to analyze protein-drug binding 
in silico to predict drug sensitivity or resistance. Also provided are 
databases that are used in methods provided herein and methods for 
generating the databases. 

25 In particular, target biomolecules are protein structural variants 

encoded by genes containing genetic variations, or polymorphisms. 3-D 
models of the structures of proteins are determined. The models are 
generated using molecular modeling techniques, such as homology 
modeling. The resulting models are then used in the methods provided 
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herein, which include structure-based drug design studies to design and 
identify drugs that bind to particular structural variants; structure-based 
drug design studies and to predict clinical responses in patients; and to 
design drugs that bind to all or a substantial portion of allelic variants of a 
5 target, to thereby increase the population of patients for whom a 

particular drug will be effective and/or to decrease the undesirable side- 
effects in a larger population. 

Hence, computer-based methods of drug design based on target 
protein structural models derived from genetic polymorphisms are 

10 provided. The methods involve obtaining one, preferably two or more 
amino acid sequences of a target protein that is the product of a gene 
exhibiting genetic polymorphisms, where sequences represent different 
genetic polymorphisms, and generating 3-D protein structural variant 
models from the sequences. Structure-based drug design techniques are 

15 used to design potential new drug candidates or to suggest modifications 
to existing drugs based on predicted intermolecular interactions of the 
drugs or drug candidates with the models. Alternatively, drug molecules 
can be computationally docked with 3-D protein structural variant models 
based upon the sequences and energetically refined before performing 

20 structure-based drug design studies. 

In preferred embodiments, binding interactions between a drug or 
potential new drug candidate molecules and the structural variants are 
calculated in order to optimize intermolecular interactions between drug or 
potential drug molecules and the structural variant models or to select 

25 drug therapies for patients by determining a drug or drugs that have 
favorable binding interactions with the structural variant models. 

In other embodiments, the binding interactions are determined by 
calculating the free energy of binding between the protein structural 
variant model and a docked molecule; and decomposing the total free 
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energy of binding based on the interacting residues in the protein active 
site. 

After the protein structural variant models are generated, selected 
model structures are analyzed to determine common structural features 
5 that are conserved throughout the selected models. The conserved 
structural features can serve as scaffolds or pharmacophore models into 
which potential drugs or modified drugs are docked. For example, the 
selected model structures may represent the structural variants resulting 
from the most commonly occurring genetic polymorphisms or from 

10 genetic polymorphisms found in a specific patient subpopulation, such as 
a particular age group, ethnic or racial group, sex, or other subpopulation. 
Alternatively, the models may be selected based on clinical information, 
for example, the structural variants may be derived based on patients 
receiving a specific treatment regimen or exhibiting a particular clinical 

15 response to a given drug or on the duration of a particular drug treatment. 
The methods provided herein can be used for predicting clinical 
responses in patients based on genetic polymorphisms. For example, a 
structural variant model derived from a subject, such as a human patient, 
exhibiting a particular genetic polymorphism is generated and screened 

20 against a number of reference protein structural variant models derived 
from genetic polymorphisms of the same gene in other such subjects. In 
certain embodiments, the reference structures are stored in a database, 
preferably with observed clinical data associated with the structures, or 
polymorphisms. The structural variant model from the subject is 

25 compared to a reference structures, for example, by database searching, 
in order to identify reference structural variants that are similar to the 
model structure derived from the subject. Based on the premise that 
structurally similar targets will have similar clinical responses, a clinical 
outcome can be predicted for the patient based on the structures 
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identified through structural comparison or database searching. This 
information can also be used in the design and analysis of clinical trials; it 
can also be used for selecting appropriate therapies for a subject in 
instances in which the subject is a patient and the protein is a drug 
5 target. 

The methods are also used to design therapeutic agents that are 
active against biological targets that have become drug resistant, 
particularly due to genetic mutations. In certain embodiments, 3-D 
protein structural variant models are generated for a target protein in 

10 which genetic mutations have occurred and against which a given drug is 
no longer biologically active. The models are compared to 3-D protein 
structural variant models of the target protein against which the drug has 
biological activity in order to identify structural differences between the 
susceptible and resistant targets. The differences can be used to 

15 understand the structural contributions to drug resistance, and this 

information can be utilized in structure-based drug design calculations to 
identify new drugs or modifications to the existing drug that circumvent 
the resistance problem. 

A computer-based method for identifying compensatory mutations 

20 in a target protein is also provided. The method involves obtaining the 
amino acid sequence of a target protein containing multiple amino acid 
mutations that is expressed in a patient, where the structure of a form of 
the target protein that responds to a particular drug, including the active 
site, has been structurally characterized; generating a 3-D structural 

25 model of the mutated protein; comparing the structure of the mutated 
protein with the form of the protein that responds to the drug to identify 
structural differences and/or similarities arising from the mutations; 
comparing the biological activities of the drug against the mutated protein 
and the form of the protein that responds to the drug to determine the 
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effects of the mutations on drug response; and identifying the mutations 
in the protein that affect biological activity based on the comparisons. 
The target biolmolecules can also be used in a method referred to herein 
as computational phenotyping to predict drug sensitivity or resistance for 
5 a given genotype. These computer-based method for identifying 

phenotypes in silico are provided. The methods involve obtaining from a 
patient/specimen, such as a body fluid or tissue sample, including blood, 
cerebral spinal fluid, urine, saliva, sweat and tissue samples, the amino 
acid sequence of a target protein; generating a 3-D structural model of 
10 the target protein; performing protein-drug binding analyses; and 

predicting drug sensitivity or resistance based on the protein-drug binding 
analyses. 

Molecular structure databases containing protein structural variant 
models produced by the methods are also provided. The databases may 

15 also contain biological or clinical data associated with the structural 
variants. The databases can be interfaced to a molecular graphics 
package for visualization and analysis of the 3-D molecular structural 
models. In particular, databases containing the 3-D structures of 
polymorphic variants of selected target genes, particularly 

20 pharmaceutical^ significant genes with pharmaceutically significant gene 
products, such as proteases and polymerases, including reverse 
transcriptases, and receptors, such as cell surface receptors, are 
provided. The databases may be stored an provided on any suitable 
medium, including, but are not limited to, floppy disks, hard drives, CD- 

25 ROMS and DVDs. 

Also provided are relational databases for managing and using 
information relating to genetic polymorphisms. The databases contain 3- 
D molecular coordinates for structural variants derived from genetic 
polymorphism, a molecular graphics interface for 3-D molecular structure 
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FIG. 4 shows the correlation between experimental and calculated 
changes of binding energy upon ligand modifications in the binding site of 
NS3. 

FIG. 5 shows a comparison of calculated versus experimental 
5 binding free energy changes for complexes of the tumor necrosis factor 
(TNF) receptor with different inhibitors. 

FIG. 6 shows the HIV PR inhibitors approved by the FDA. 

FIG. 7 shows the frequency versus amino acid residue plot of HIV 

PR. 

10 FIG. 8 shows frequency analysis of 10591 HIV PR Sequences, 

where ResNum is the residue number; TotOcc is the total occurrence of 
the mutation; Dist is the distance of the mutating residue from 
approximate center of active site (Asp28); WtAA is the amino acid in the 
wild type protein; NumMut is the number of mutations; and MutList is a 
15 list of amino acid mutations. 

FIG. 9 is a block diagram of an exemplary computer. 
FIG. 10 is a graphical representation of a relational database. 
FIG. 11 is a tabulation of the 3-D coordinates of a representative 
entry in a database that includes 3-D structures. 
20 DETAILED DESCRIPTION OF THE INVENTION 

A. Definitions 

B. Computer-based methods of drug design based on genetic 
polymorphisms 

25 1 . Methods for obtaining amino acid sequences of a target 

protein 

2. Generation of 3-D protein structural variant models 

a. Homology Modeling 

b. Ab initio generation of 3-D structures 
30 c. Crystal structures 

3. Use of 3-D structural variant models in drug design 

a. Selection of relevant structural variants 

b. Drug design 

c. Computational docking 
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d. 



Free energy of binding studies 



C. 



Applications of computer-based methods 



10 



5 



4. 
5. 



1. 
2. 
3. 



Genetic polymorphisms and structure-based drug design 
Drug resistance 

Identification of conserved structural features or 
pharmacophores 

Identification of compensatory structural changes 
Clinical Applications 



D. 



Creation of 3-D Structural Polymorphism Databases 



15 



1. 
2. 



Exemplary Databases and generation thereof 
Computer systems and Database 



E. Computational phenotyping 
A. Definitions 

Unless defined otherwise, all technical and scientific terms used 

20 herein have the same meaning as is commonly understood by one of skill 
in the art to which this invention belongs. All patents, patent 
applications, published patent applications and publications referred to 
herein are, unless noted otherwise, incorporated by reference in their 
entirety. In the event a definition in this section is not consistent with 

25 definitions elsewhere, the definition set forth in this section will control. 

As used herein, polymorphism refers to a variation in the sequence 
of a gene in the genome amongst a population, such as allelic variations 
and other variations that arise or are observed. Genetic polymorphisms 
refers to the variant forms of gene sequences that can arise as a result of 

30 nucleotide base pair differences, alternative mRNA splicing or post- 

translational modifications, including, for example, glycosylation. Thus, a 
polymorphism refers to the occurrence of two or more genetically 
determined alternative sequences or alleles in a population. These 
differences can occur in coding and non-coding portions of the genome, 

35 and can be manifested or detected as differences in nucleic acid 
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sequences, gene expression, including, for example transcription, 
processing, translation, transport, protein processing, trafficking, DNA 
synthesis, expressed proteins, other gene products or products of 
biochemical pathways or in post-translational modifications and any other 
5 differences manifested among members of a population. A single 

nucleotide polymorphism (SNP) refers to a polymorphism that arises as 
the result of a single base change, such as an insertion, deletion or 
change in a base. 

A polymorphic marker or site is the locus at which divergence 

10 occurs. Such site may be as small as one base pair (an SNP). 

Polymorphic markers include, but are not limited to, restriction fragment 
length polymorphisms, variable number of tandem repeats (VNTR's), 
hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide k 
repeats, tetranucleotide repeats and other repeating patterns, simple 

15 sequence repeats and insertional elements, such as Alu. Polymorphic 
forms also are manifested as different mendelian alleles for a gene. 
Polymorphisms may be observed by differences in proteins, protein 
modifications, RNA expression modification, DNA and RNA methylation, 
regulatory factors that alter gene expression and DNA replication, and any 

20 other manifestation of alterations in genomic nucleic acid or organelle 
nucleic acids. 

As used herein, structural variants proteins refer the variety of 3-D 
molecular structures or models thereof that result from the 
polymorphisms. These variants typically arise from transcription and 
25 translation of genes containing genetic polymorphisms, but also include 
diffentially glyocsylated or otherwise post-translationally modified variants 
that potentially exhibit differential interactions with drugs and drug 
candidates. 
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As used herein, binding interactions refer to atomic or physical 
interactions between molecules including, but not limited to binding free 
energy, hydrophobic interactions, electrostatic interactions, steric 
interactions and other interactions that are commonly considered by those 
5 of skill in the art to determine the affinity of one molecule to bind to 
another. Favorable binding interactions refer to binding interactions that 
promote physical or chemical associations between molecules. 

As used herein, a target protein is defined as a protein that is a 
receptor with which drugs or other ligands, such as small molecule or 
10 peptide agonists or antagonists or other proteins or biomacromolecules, 
such as DNA or RNA, interact to bring about a biological response. 

As used herein, structure-based drug design refers to computer- 
based methods in which 3-D coordinates for molecular structures are 
used to identify potential drugs that can interact with a biological 
15 receptor. Examples of such methods include, but are not limited to, 
searching of small molecule libraries or databases, conformational 
searching of a ligand within an active site of identify biologically active 
conformations or computational docking methods. 

As used herein, pharmacogenetics refers to study of the variablity 
20 of patient responses to drugs due to inherent genetic differences. 

As used herein, computational docking refers to techniques 
wherein molecules, for example, a ligand and receptor or active site, are 
fitted together based on complementary interactions, for example, steric, 
hydrophobic or electrostatic interactions. 
25 As used herein, energetic refinement refers to the use of molecular 

mechanics simulation techniques, such as energy minimization or 
molecular dynamics, or other techniques, such as quantum-based 
approaches, to "adjust" the coordinates of a molecular structural model to 
bring it into a stable, low energy, conformation. In molecular mechanics 



r 
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simulations, the potential energy of a molecular system is represented as 
a function of its atomic coordinates along with a set of atomic 
parameters, called a forcefield. Energy minimization refers to a method 
wherein the coordinates of a molecular conformation are adjusted 
5 according to a target function to result in a lower energy conformation. 
Molecular dynamics refers to methods for simulating molecular motion by 
inputting kinetic energy into the molecular system corresponding to a 
specified temperature, and integrating the classical equations of motion 
for the molecular system. During a molecular dynamics simulation, a 

10 system undergoes conformational changes so that different parts of its 
accessible phase space are explored. 

As used herein, clinical data refers to information obtained from 
patients pertaining to pharmacological responses of the patient to a given 
drug, including, but not limited to efficacy data, side effects, resistance or 

15 susceptibility to drug therapy, pharmacokinetics or clinical trial results. 

As used herein, patient histories, include medical histories and 
other any information, such as parental medical histories, dates and 
places of birth of the patient and parents, number of siblings, number of 
children and other such data. 

20 As used herein, compensatory mutations are mutations that act in 

concert with active site mutations by compensating for functional deficits 
caused by changes or mutations that affect binding in the active site. 

As used herein, a relational database is a collection of data items 
organized as a set of formally-described tables from which data can be 

25 accessed or reassembled in many different ways without having to 
reorganize the database tables. Such databases are readily available 
commercially, for example, from Oracle, IBM, Microsoft, Sybase, 
Computer Associates, SAP, or multiple other vendors. 
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As used herein, a phenotype refers to a set of parameters that 
includes any distinguishable trait of an organism. A phenotype can be 
physical traits and can be, in instances in which the subject is an animal, 
a mental trait, such as emotional traits. Some phenotypes can be 
5 determined by observation elicited by questionnaires or by referring to 
prior medical and other records. For purposes herein, a phenotype. is a 
parameter around which the database can be sorted. 

As used herein, genotype refers to a specific gene or totality of 
genetic information in a specific cell or organism. 

10 As used herein, haplotype refers refers to two or more 

polymorphism located on a single DNA strand. Hence, haplotyping refers 
to identification of two or more polymorphisms on a single DNA strand. 
Haplotypes can be indicative of a phenotype. 

As used herein, a parameter is any input data that will serve as a 

15 basis for sorting the database. These parameters will include phenotypic 
traits, medical histories, family histories and any other such information 
elicited from a subject or observed about the subject. A parameter may 
describe the subject, some historical or current environmental or social 
influence experienced by the subject, or a condition or environmental 

20 influence on someone related to the subject. Paramaters include, but are 
not limited to, any of those described herein, and known to those of skill 
in the art. 

As used herein, computational phenotyping, refers to computer- 
based processes that assess the phenotype resulting from a particular 
25 genotype. The phenotype describes observables, such as, but are not 
limited to, the structure of the encoded protein, its functional 
morphological and structural attributes. In particular, as contemplated 
herein, the phenotype that is assesed is the interaction of a protein with a 
particular compounds, particularly a drug. As exemplified herein, the 
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method provides a means to select an effective drug for a particular 
subjects, particularly mammals, or class thereof. 

As used herein, a database refers to a collection of data; in this 
case data relating to polymorphic variants. Hence a database contains 
5 the nucleic acid sequences encoding the variants, or a portion of the 
variant, such as a portion contianing the active site or targetted site. 
Additionally, the database may contain other information related to each 
entry, including but are not limited to, the corresponding 3-D structure of 
the encoded protein (or a portion thereof) and information regaring the 

10 source of each sequence. Some of the entries in a database may be 
identical, and for purposes herein, a database contains at least 2 
different entries, typically far more than 2 entries. The number of entries 
depends upon the protein of interest and variety and number of 
polymorphisms that exist. Generally a database will have at least 10 

15 different entries, typically more than 100, more than 500, more than 
1000, more than 2000, 3000, 4000, 5000, 8000, 10,000, 50,000, 
100,000 and greater. Databases herein containing 20,000 entries and 
more have been generated and are exemplified herein. 

As used herein, a relational database stores information in a form 

20 representative of matrices, such as two-dimensional tables, including 

rows and columns of data, or higher dimensional matrices. For example, 
in one embodiment, the relational database has separate tables each with 
a parameter. The tables are linked with a record number, which also acts 
as an index. The database can be searched or sorted by using data in the 

25 tables and is stored in any suitable storage medium, such as floppy disk, 
CD rom disk, hard drive or other suitable medium. 

As used herein, a profile refers to information relating to, but not 
limited to and not necessarily including all of, age, sex, ethnicity, disease 
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history, family history, phenotypic characteristics, such as height and 
weight and other relevant parameters. 

As used herein, a biopolymer includes, but is not limited to, nucleic 
acid, proteins, polysaccharides, lipids and other macromolecules. Nucleic 
5 acids include DNA, RNA, and fragments thereof. Nucleic acids may be 
derived from genomic DNA, RNA, mitochondrial nucleic acid, chloroplast 
nucleic acid and other organelles with separate genetic material. 

As used herein, a DNA or nucleic acid homolog refers to a nucleic 
acid that includes a preselected conserved nucleotide sequence. By the 
10 term "substantially homologous" is meant having at least 80%, preferably 
at least 90%, most preferably at least 95% homology therewith or a less 
percentage of homology or identity and conserved biological activity or 
function. 

As used herein, a receptor refers to a molecule that has an affinity 
15 for a given ligand. Receptors may be naturally-occurring or synthetic 
molecules. Receptors may also be referred to in the art as anti-ligands. 
As used herein, the terms, receptor and anti-ligand are interchangeable. 
Receptors can be used in their unaltered state or as aggregates with other 
species. Receptors may be attached, covalently or noncovalently, or in 
20 physical contact with, to a binding member, either directly or indirectly via 
a specific binding substance or linker. Examples of receptors, include, but 
are not limited to: antibodies, cell membrane receptors surface receptors 
and internalizing receptors, monoclonal antibodies and antisera reactive 
with specific antigenic determinants (such as on viruses, cells, or other 
25 materials), drugs, polynucleotides, nucleic acids, peptides, cofactors, 
lectins, sugars, polysaccharides, cells, cellular membranes, and 
organelles. 
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Examples of receptors and applications using such receptors, ■ 
include but are not restricted to: 

a) enzymes: specific transport proteins or enzymes essential to 
survival of microorganisms, which could serve as targets for antibiotic 

5 (ligand) selection; 

b) antibodies: identification of a ligand-binding site on the antibody 
molecule that combines with the epitope of an antigen of interest may be 
investigated; determination of a sequence that mimics an antigenic 
epitope may lead to the development of vaccines of which the 

10 immunogen is based on one or more of such sequences or lead to the 
development of related diagnostic agents or compounds useful in 
therapeutic treatments such as for auto-immune diseases; 

c) nucleic acids: identification of ligand, such as protein or RNA, 
binding sites; 

15 d) catalytic polypeptides: polymers, preferably polypeptides, that 

are capable of promoting a chemical reaction involving the conversion of 
one or more reactants to one or more products; such polypeptides 
generally include a binding site specific for at least one reactant or 
reaction intermediate and an active functionality proximate to the binding 

20 site, in which the functionality is capable of chemically modifying the 
bound reactant (see, e.g., U.S. Patent No. 5,215,899); 

e) hormone receptors: determination of the ligands that bind with 
high affinity to a receptor is useful in the development of hormone 
replacement therapies; for example, identification of ligands that bind to 

25 such receptors may lead to the development of drugs to control blood 
pressure; and 

f) opiate receptors: determination of ligands that bind to the opiate 
receptors in the brain is useful in the development of less-addictive 
replacements for morphine and related drugs. 
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As used herein, prion refers to an infectious pathogen that causes 
central nervous system spongiform encephalopathies in humans and 
animals. No nucleic acid component is necessary for the infectivity of 
prion protein {see, e.g., U.S. Patent No. 5,808,969). 
5 As used herein, a ligand is a molecule that is specifically recognized 

by a particular receptor. Examples of ligands, include, but are not limited 
to, agonists and antagonists for cell membrane receptors, toxins and 
venoms, viral epitopes, hormones (e.g., steroids), hormone receptors, 
opiates, peptides, enzymes, enzyme substrates, cofactors, drugs, lectins, 
10 sugars, oligonucleotides, nucleic acids, oligosaccharides, proteins, and 
monoclonal antibodies. 

As used herein, complementary refers to the topological 
compatibility or matching together of interacting surfaces of a ligand 
molecule and its receptor. Thus, the receptor and its ligand can be 
15 described as complementary, and furthermore, the contact surface 
characteristics are complementary to each other. 

As used herein, a ligand-receptor pair or complex formed when two 
macromolecules have combined through molecular recognition to form a 
complex. 

20 The terms "homology" and "identity" are often used 

interchangeably. In this regard, percent homology or identity may be 
determined, for example, by comparing sequence information using a GAP 
computer program. The GAP program utilizes the alignment method of 
Needleman and Wunsch (J. Mo/. Biol. 48:443 (1970), as revised by Smith 

25 and Waterman (Adv. Appl. Math. 2:482 (1981). Briefly, the GAP program 
defines similarity as the number of aligned symbols (i.e., nucleotides or 
amino acids) which are similar, divided by the total number of symbols in 
the shorter of the two sequences. The preferred default parameters for 
the GAP program may include: (1) a unary comparison matrix (containing 
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a value of 1 for identities and 0 for non-identities) and the weighted 
comparison matrix of Gribskov and Burgess, Nucl. Acids Res. 14:6745 
(1986), as described by Schwartz and Dayhoff, eds., ATLAS OF PROTEIN 
SEQUENCE AND STRUCTURE, National Biomedical Research Foundation, 
5 pp. 353-358 (1979); (2) a penalty of 3.0 for each gap and an additional 
0.10 penalty for each symbol in each gap; and (3) no penalty for 
end gaps. 

Whether any two nucleic acid molecules have nucleotide sequences 
that are at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% 

10 "identical" can be determined using known computer algorithms such as 
the "FAST A" program, using for example, the default parameters as in 
Pearson and Lipman, Proc. Natl. Acad. Sci. USA 55:2444 (1988). 
Alternatively the BLAST function of the National Center for Biotechnology 
Information database may be used to determine identity 

15 In general, sequences are aligned so that the highest order match 

is obtained. "Identity" per se has an art-recognized meaning and can be 
calculated using published techniques. (See, e.g. : Computational 
Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 
1988; Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., 

20 Academic Press, New York, 1993; Computer Analysis of Sequence Data, 
Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 
1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic 
Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, 
J., eds., M Stockton Press, New York, 1991). While there exist a number 

25 of methods to measure identity between two polynucleotide or 

polypeptide sequences, the term "identity" is well known to skilled 
artisans (Carillo, H. & Lipton, D., SIAM J Applied Math 43:1073 (1988)). 
Methods commonly employed to determine identity or similarity between 
two sequences include, but are not limited to, those disclosed in Guide to 
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Huge Computers, Martin J. Bishop, ed., Academic Press, San Diego, 
1994, and Carillo, H. & Lipton, D., SI AM J Applied Math 45:1073 
(1988). Methods to determine identity and similarity are codified in 
computer programs. Preferred computer program methods to determine 
5 identity and similarity between two sequences include, but are not limited 
to, GCG program package (Devereux, J., etaL, Nucleic Acids Research 
12(0:387 (1984)), BLASTP, BLASTN, FASTA (Atschul, S.F., et ai, J 
Molec Biol 2/5:403 (1990)). 

Therefore, as used herein, the term "identity" represents a 

10 comparison between a test and a reference polypeptide or polynucleotide. 
For example, a test polypeptide may be defined as any polypeptide that 
is 90% or more identical to a reference polypeptide. 

As used herein, the term at least "90% identical to" refers to 
percent identities from 90 to 99.99 relative to a reference polypeptide. 

15 Identity at a level of 90% or more is indicative of the fact that, assuming 
for exemplification purposes a test and reference polynucleotide length of 
100 amino acids are compared. No more than 10% (i.e., 10 out of 100) 
amino acids in the test polypeptide differs from that of the reference 
polypeptides. Similar comparisons may be made between a test and 

20 reference polynucleotides. Such differences may be represented as point 
mutations randomly distributed over the entire length of an amino acid 
sequence or they may be clustered in one or more locations of varying 
length up to the maximum allowable, e.g. 10/100 amino acid difference 
(approximately 90% identity). Differences are defined as nucleic acid or 

25 amino acid substitutions, or deletions. 

As used herein, AMBER is a force field well known in the arts and 
designed for the study of proteins and nucleic acids as defined in Weiner 
et al. J. Comput. Chem. (1986) 7:230-252, where a modified AMBER 
(version 3.3) force field is a fully vectorized version of AMBER (version 
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3.0) with coordinate coupling, intra/inter decomposition, and the option to 
include the polarization energy as part of the total energy. AMBER is 
available in commercially available molecular modeling programs such as, 
but not limited to, Macromodel (Columbia University). 
5 As used herein, ECEPP (Empirical Conformational Energies of 

Peptides Program) is a force field well know in the arts (US Patent No. 
5,910,478; 5,846,763). ECEPP/3 refers to version 3 of this well known 
force field. 

As used herein, QSAR refers to structure-activity relationship. 

10 As used herein, vdw refers to van der Waals. 

As used herein, RMSD refers to root mean-squared deviation. 

As used herein, medical history refers to the parameters and data 

typically obtained by a physician when examining a subject or other such 

professional when examining other mammals, and includes such 

15 information as prior diseases, age, weight, height, sex and other 

information. For purposes, the subjects that serve as the source of the 

samples from which nucleic acids encoding polymorphisms are isolated, 

include animals, plants, pathogens and any organism that has nucleic acid 

that exhibits polymorphism. In this context medical history refers to 

20 information pertinent to the particular organism. 

As used herein, subject history, refers to data such as locale in 

which the subject was born, raised or resident or visited, and parental 

history and other such information. 

As used herein, a drug is an agent that binds to or interacts with a 

25 targeted protein. For purposes, a therapeutic agent is a drug. 

B. Computer-based methods of drug design based on genetic 
polymorphisms 

Methods for computer-based drug design based on genetic poly- 
morphisms are provided. The methods includes the steps of obtaining 
30 one or more, preferably two or more, amino acid sequences of a target 
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protein that is the product of a gene exhibiting genetic polymorphisms; 
generating 3-dimensional (3-D) protein structural variant models of all or a 
portion of the protein from the sequences; and based upon the structures 
of the 3-D models, designing drug candidates or modifying existing drugs 
5 based on the predicted intermodular interactions of the drug candidates 
or modified drugs with the structural variants or portions thereof by 
computationally docking drug molecules with the target protein models; 
and then, optionally energetically refining the docked complexes; 
determining the binding interactions between the drug or potential new 

10 drug candidate molecules and the models by calculating the free energy 
of binding of the docked complexes and decomposing the total free 
energy of binding based on interacting residues in the protein active site 
or sites deemed important for protein activity. 

A variety of methods that include these steps are provided. Such 

15 methods have particularl application, for example, in predicting patient 
responses. As noted, patients exhibit variable responses to drugs. For 
some patients a drug may be very beneficial and achieve a desired 
response; whereas for other patients, with the same disorder, the same 
drug will have little or no effect. It is known that individuals as well as 

20 groups of individuals exhibit a variety of genetic polymorphisms. As 

described herein, the presence or absence of such polymorphisms can be 
correlated with the variability of patient responses to drugs. 

It is shown herein that by understanding how genetic poly- 
morphisms affect 3-D protein structure of a drug target, for example, it is 

25 possible to ascertain the interaction of a particular drug with the target in 
a particular patient or groups of patients. Based upon this interaction, the 
outcome can be predicted. It will be possible to determine whether a 
patient will benefit from a drug or be at risk for a particular side effect. It 
is possible to predict these responses before exposure to the drug. These 
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methods also permit rational design of drugs that can treat various 
populations or ultimately even individuals. These differences and effects 
can also be taken into account to design drugs that are not dependent 
upon a particular polymorphism. 
5 Hence, the knowledge derived from understanding the effects of 

genetic polymorphisms can be used to develop and apply therapeutics 
more effectively, make clinical trials more successful, for example, by 
permitting selection of test subjects with the same polymorphism or with 
polymorphisms for which the drug is designed to interact effectively. 

10 It is shown herein that it is advantageous to use 3-D molecular 

structures in drug design rather than to consider primary sequence alone. 
For example, most drugs target proteins either in the afflicted organism or 
in a pathogen. Disease, drug action and toxicity are all manifested at the 
protein level. Although the nucleotide sequences of genetic 

15 polymorphisms might appear to be quite different, the resulting protein 
targets may have similar shapes and, therefore, the protein biological 
function might be the same. Conversely, although genetic polymorphism 
sequences might appear similar, the resulting proteins may have critical 
differences in their 3-D structures that greatly affect biological activity. 

20 Thus, use of 3-D protein structure models in such methods provide 

advantages not heretofor realized. Methods for generating 3-D structures 
are known to those of skill in the art and are also provided herein. 

Once the protein target structural models have been selected, 
structure-based drug discovery methodologies, for example, 

25 computational screening or docking programs and methods {e.g., DOCK 
(available from University of Ca, San Francisco; and AUTODOCK available 
from Scripps Research Institute, La Jolla), are used to design biologically- 
active compounds based on the 3-D structures of the biomolecular 
receptors. Using these methods, drug designers can identify and 
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computationally rank the various potential clinical drug candidates for 
maximum efficacy, thereby performing drug discovery in silico and 
avoiding the tedious time and expense associated with in vitro drug 
discovery methods. 

5 In addition to drug design applications, the information derived from 

studying the structures of biological targets can be used to understand 

and predict biological responses in patients, such as efficacy, toxicity, 

drug resistance and other pharmacological effects. Since human clinical 

trials may cost upwards of $100-300 million, it is desirable to predict the 

10 outcome to the greatest extent possible for each prospective drug 

candidate so that the best prospective drug candidates are advanced to 

clinical trials. As described below, methods are provided herein for 

selecting populations for clinical trials. 

1 . Methods for obtaining amino acid sequences of a target 
1 5 protein 

Any protein or gene or encoded mRNA that exhibits 

polymorphisms, herein referred to as the target protein, in structure is 

contemplated for use herein and for generating the databases as provided 

herein. The target protein is a protein, polypeptide, or oligopeptide that 

20 includes, but is not limited to, receptors, enzymes, hormones, prions, or 
any such compound with which drugs or other ligands, such as small 
molecules, peptide agonists, peptide antagonists, other proteins, nucleic 
acids and other biomacromolecules, interact to bring about a biological 
response. These target proteins occur in any organism, including plants 

25 and animals, eukaryotes and prokaryotes, including pathogens, such as 
protozoans, parasites, viruses, includind DNA and retroviruses, and 
bacteria. The protein or gene can be one expressed in the organism, such 
as molecule targeted for drug interaction , or one expressed in a 
pathogen. 
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The target gene is one that exhibits polymorphisms (i.e., sequence 
variations among a population) and the target protein is the product of a 
gene exhibiting genetic polymorphisms, or sequence variations, as 
described herein. Any gene or protein that exhibits polymorphisms is 
5 contemplated herein. In particular, genes that encode proteins, 

polypeptides, or oligopeptides that are targets for drug interaction are 
contemplated herein. The genetic polymorphisms can occur in the genes 
of pathogens {e.g. viruses, bacteriae, and fungi), parasites, plants, 
animals, and humans. As such, the sequence a target protein can be 

10 obtained by the isolation and analysis of the gene or gene product in 
samples taken from pathogens, parasites, plants, animals, and humans, 
most preferably from humans. 

The genes or proteins may be isolated from any source, such as 
animal or plant specimens, or the sequences obtained from any source, 

15 including known databases. If starting with gene sequences that include 
single or multiple nucleotide polymorphisms, the amino acid sequences of 
the translated proteins can be determined. Protein isolation and 
sequencing methods are well known to those of skill in the art. 
Alternatively, samples of the target protein can be obtained and 

20 sequenced directly from specimens. Multiple sequence analyses can be 
performed to determine the exact amino acid variations or mutations 
resulting from the genetic polymorphisms. 

Amino acid sequences of target proteins can also be obtained from 
data banks and databases (e.g. GenBank, Swiss Prot, PIR) and from 

25 publications and other sources in which numerous polymorphisms have 
been identified and mapped. Samples may be obtained from, for example 
blood and tissue banks, nucleic acid isolated, genes selected or identified 
and polymorphims can be mapped from such samples. 
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2. Generation of 3-D protein structural variant models 
After the amino acid sequences of target proteins are obtained via 
the means described in section 1 , the 3-D structural models of the 
sequences of native proteins or of the protein structural variants are then 
5 determined. They can be determinedthrough experimental methods, such 
as x-ray crystallography and NMR, and from structure databases, such as 
the Protein Databank (PDB). Moreover, 3-D structural models can be 
determined by using any of a number of well known techniques for 
predicting protein structures from primary sequences (e.g. SYBYL (Tripos 

10 Associated, St. Louis, Mo.), de novo protein structure design programs 
(e.g. MODELER (MSI, Inc., San Diego, CA) and MOE (Chemical 
Computing Group, Montreal Canada) and ab initio methods, see, e.g., 
U.S. Patent Nos. 5,331,573, 5,579,250 and 5,612,895), homology 
modeling, and ab initio computational analysis. Homology modeling, 

15 structure determination based upon x-ray crystallographic structures, and 
ab initio techniques and combinations of these methods are among those 
preferred herein. 

a. Homology Modeling 
Homology modeling is based on the relationship between protein 

20 evolutionary origin, function and folding patterns. Proteins of related 
origin and function have conserved sequences and structural features 
among the members of a homologous family. Using these relationships, a 
three-dimensional structural model for a protein of unknown structure can 
be constructed by using composite parts of related proteins in the same 

25 family. Where only the primary amino acid sequence of a target protein is 
known, the sequence can be compared to the sequences of related 
proteins with known structures (reference proteins), and a model can be 
built by incorporating the structural attributes of the reference protein 
together with the sequence of the target protein. 
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Sequence homology calculations generally require: the amino acid 
sequence of the target protein; a high resolution structure for at least one, 
but preferably more, related reference proteins; and any other related 
amino acid sequences. The reference proteins include structures which 
5 are similar to the target protein, either by sequence, fold, function, or 
which are polymorphisms of the target protein. The more related protein 
structures and sequences that are available or determined, the more 
reliable the technique will be at providing an accurate model. 

In constructing a protein model using homology modeling, se- 

10 quence alignment is performed between the target sequence and any 

known structures within the protein family. Sequence alignment requires 
determining the similarity between protein sequences by maximizing the 
number of matches between the sequences while introducing the mini- 
mum number of insertions and deletions. Sequence alignment algorithms 

15 are well known in the art, and standard gap penalties (i.e., programs that 
automatically introduce gaps to maximize alignment and then adjust the 
percentage of identity by applying penalties for gap number and gap 
length) and other parameters can be selected by the skilled artisan. 
Additionally, the 3-D structures of the known reference proteins, 

20 preferably, are aligned to give the best overall fit for the proteins in the 
family. This provides indication of structurally-conserved regions, such as 
regions of the proteins that do not contain insertions or deletions, among 
the reference structures. 

Once the sequences are aligned and the structurally-conserved re- 

25 gions are identified, the coordinates of the reference proteins can be used 
to construct a 3-D model of the target structure. Coordinates from the 
protein backbone of the reference proteins are then used to "construct the 
backbone framework for the target protein structure. Side chains can be 
constructed, for example, by using side chain coordinates from the 
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reference proteins, searching from a database to obtain side chain 
conformations that fit in with the existing structural framework or by 
generating side chains ab initio to establish energetically favorable side 
chain conformations. 
5 The non-conserved regions of the unknown protein can be con- 

structed, for example, using database searching. A database of known 
protein structures (e.g., PDB) can be searched to identify variable regions 
in other proteins that have a high degree of sequence similarity to the 
target sequence and that fit onto the existing structural framework of the 

10 protein model. Algorithms for performing sequence similarity matching 
and homology model building are well known in the art and are available 
commercially (available from Molecular Simulations, Inc., Tripos, Inc. and 
from numerous academic sources). 

The variable regions can also be modeled by fitting the target 

15 sequence to a peptide backbone generated by varying phi and psi angles 
{e.g., by calculating Ramachandran or Balasubramanian plots, see, 
Balasubramanian (1974) "New type of representation for Mapping Chain 
Folding in Protein Molecules," Nature 266:856-857) or Balaji plots, see, 
U.S. Patent Nos. 5,331,573, 5,579,250 and 5,612,895) of the amino 

20 acids to give a loop structure that can be integrated into the model 

structure based on a sterically and energetically reasonable fit (Figure 1). 

In a Balasubramanian plot, the peptide is depicted as a series of 
different vertical lines, each having solid dots and open circles aligned 
with the corresponding <p, t/s angle values on the vertical axis, and where 

25 each line corresponds to the particular number of the residue having the 
plotted 0, (f/ angles as indicated on a horizontal axis. In the Balaji plot, 
the values of the <p, i// angles are shown as the base and tip of a vertical 
wedge (assuming a vertical angular axis), respectively, with a separate 
wedge being horizontally positioned on the plot as a function of the 
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residue number of the <p, tfJ angles plotted. The Balaji plot replaces the 
solid dots and open circles of the Balasubramanian Plot with the base of a 
wedge and the tip of a wedge, respectively; and further replaces the 
vertical line joining the dots and open circles of the Balasubramanian plot 
5 with the body of the wedge. 

b. Ab initio generation of 3-D structures 
Alternatively, ab initio methods can be used in combination with an 
existing partial homologous structure to generate unresolved portions of 
the target structure. Such methods are described, for example, in U.S. 

10 Patent Nos. 5,331,573, 5,579,250 and 5,612,895, which as all patents, 
applications and publications referenced herein, are each incorporated in 
their entirety. These methods involve: simulating a real-size primary 
structure of a polypeptide in a solvent box, i.e., an aqueous environment; 
shrinking the size of the peptide isobarically and isothermally; and 

15 expanding the peptide to its real size in selected time periods, while 

measuring the energy state and coordinates, i.e., the bonds, angles and 
torsions of the expanding molecule. As the peptide expands to its full 
size, it assumes a stable tertiary structure. In most cases, due to the 
manner in which the expansion occurs, this tertiary structure will be 

20 either the most probable structure {i.e., it will represent a global minimum 
for the structure) or one of the most probable structures. The energy 
equations used to perform the ab initio simulation are based on the 
potential energy of the simulated molecule as described using molecular 
mechanics. 

25 Once a model is built, it can be refined using energy minimization, 

molecular dynamics calculations, or simulated annealing as described 
herein. The steric and energetic quality of the structural models is then 
evaluated by analyzing the structural attributes of the model, such as phi 
and psi angles {e.g., by calculating Ramachandran or Balasubramanian or 
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Balaji plots), or the energetics of the model, such as by calculating energy 
per residue or strain energy. If the overall quality of the model is not 
satisfactory, further iterative energy refinement can be performed until the 
model is considered to be acceptable [i.e., e av < 1.5, see below). 
5 A preferred method for generating and refining the structural 

variant models is illustrated in FIG. 1. First, at block 100 of FIG. 1, 
protein sequence information, derived genetic polymorphisms, is obtained 
from the methods described earlier. At block 102, the protein is assigned 
to a protein superfamily in order to identify related proteins to be used as 

10 templates to construct a 3-D model of the protein. If the superfamily is 
not known, sequence analysis or structural similarity searches can be 
performed to identify related proteins for use as templates in homology 
modeling studies, as described herein, as indicated at block 104. 

Once the conserved regions of the model are assembled, ab initio 

15 loop prediction (Dudek et at. (1998) J. Comp. Chem. 75:548-573) 

indicated at 106A or ab initio secondary structure generation techniques 
of block 106B, techniques in which the alignments are adjusted using 
information on the secondary structure, functional residues, and disulfide 
bonds as described herein, can be used to complete the model (e.g. U.S. 

20 Patents Nos. 5,331,573; 5,579,250; and 5,612,895). This model, 

complete with loops, is then subjected to refinement procedures (block 
1 10) based on molecular mechanics, molecular dynamics, and simulated 
annealing methods. Energetic refinement of the structure can be 
accomplished by performing molecular mechanics calculations using, for 

25 example, an ECEPP type forcefield (Dudek et a/. (1998) J. Comp. Chem. 
75:548-573) or through molecular dynamics simulations using, for 
example, a modified AMBER type forcefield (Ramnarayan et al. (1990) J. 
Chem. Phys. 92:7057-7076. As known to those of skill in the art a 
modified AMBER (version 3.3) force field is a fully vectorized version of 
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AMBER (3.0) with coordinate coupling, intra/inter decomposition, and the 
option to include the polarization energy as part of the total energy (see, 
e.g., Weiner et al. (1986) J. Comp. Chem. 7:230-252). If necessary, the 
3-D structures can be dynamically refined, for example, by using a 
5 simulated annealing protocol (e.g.,, 100 ps equilibration, 500 ps 
dynamics, up to 1000°K, 1 fs data collection). 

The refinement process step 1 10 is used to offset problems that 
may arise when homology models are not built carefully or when they are 
built using fully automated methods. Problems that may arise include 

10 chain breaks (e.g. consecutive C ff atoms are farther apart than the 

optimum distance of 3.7 to 3.9 A); distorted geometry (e.g. bond lengths 
and bond angles are too far from their optimal values); c/s-peptide bonds 
[e.g., incorrect isomerization of the peptide backbone in non-proline 
residues when it is not required); disallowed backbone and side-chain 

15 conformations [e.g., dihedral angles do not satisfy the Ramachandran plot 
(see, Balasubramanian (1974) Nature 266:856-857) criteria for a fully 
favorable protein structure conformation); and misfolded loops (e.g. non- 
homologous loops are generated in unnatural conformations). The 
refinement procedure 1 1 0 removes distortions of covalent geometry by 

20 using energetic methdods, converts disallowed backbone and side-chain 
conformations into allowed ones using simulated annealing methods, 
conserves protein core structure and secondary structural elements built 
by homology, and rebuilds unnatural loop constructions (Dudek et at. 
(1998) J. Comp. Chem. 73:548-573). 

25 For quality control (block 1 12), the protein structural 

characteristics, for example, stereochemistry {e.g.,, phi/psi and side chain 
angles), energetics [e.g.,, strain energy), packing profile {e.g.,, packing 
factor per residue) and hydrophobic packing are evaluated and required to 
meet acceptable criteria before the structures are used in further studies 
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or inputted into a structural polymorphism database. Quality control 
using strain energies entails computing normalized residue energies 
(NREs) based on the equation: 

e; « [E(i,X) - E AV (X)] / E SD (X), where 
5 E(i / X) is the energy of interactions of amino acid X in position i with 

protein environment and solvent; 

E AV (X), E SD (X) is the average residue energies and their standard 
deviations calculated for 20 amino acids in more than 100 high-quality 
crystal structures; and 
10 NREs characterize how favorable the interactions of each residue 

are within the protein environment (Majorov and Abagyan, (1998) Folding 
& Design 3:259). 

The average NRE characterizes the overall quality of a protein structure 
and is defined as: 

15 e av = (1/N) Z,e„ where 

e av < 0.5 denotes high-resolution X-ray crystal structures; 
e av < 1 .0 denotes good as NMR and theoretical models; and 
e av > 1.5 denotes structures that require further refinement. 
After the quality of structure is determined at block 1 1 2, the model is 

20 checked at block 1 14 to determine if it is satisfactory. If the overall 
quality of the model is not satisfactory, a "No" outcome at block 116, 
then remedial action is undertaken to fix problems at block 1 18, including 
further iterative energy refinement (block 110), and repeated checking 
(block 114). The refinement and evaluation is repeated until the model is 

25 considered to be acceptable, a "Yes" outcome at block 1 20, whereupon 
structural and/or physical properties (e.g. energetics and phi/psi angles) 
are calculated at block 1 22A and clinical data (if available) is obtained at 
block 122B, The model is then inputted into a structural polymorphism 
database at block 124. 
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FIG. 2 shows an exemplary method for generating structural variant 
models derived from genetic polymorphisms and using them in structure- 
based drug design studies. At the block numbered 200, patient data is 
acquired for a gene that exhibits genetic polymorphisms. Protein 
5 sequence information is then derived, at block 202. A check is made for 
determination of the 3-D structure of the native protein. If the 3-D 
structure has been determined, a "Yes" outcome at block 206, then a 
multiple sequence analysis is performed at block 208 to determine the 
exact amino acid variations for the structure. If the 3-D structure has not 

10 been determined, a "No" outcome at block 210, then the structure is 
determined using physiochemical methods at block 212. 

Next, at block 214, the 3-D structural models for all variants are 
generated. A refinement process is then completed at block 216 for the 
structural models. As noted above in connection with FIG. 1, the process 

15 involves subjecting each model, complete with loops, to refinement 
procedures based on molecular mechanics, molecular dynamics, and 
simulated annealing methods. As before, the energetic refinement of the 
structure can be accomplished by performing molecular mechanics 
calculations using an ECEPP type forcefield (Dudek et at. (1998) J. Comp. 

20 Chem. 75:548-573), or through molecular dynamics simulations using, for 
example, a modified AMBER type forcefield {Ramnarayan et aL (1990) J. 
Chem, Phys. 52:7057-7076), where a modified AMBER (version 3.3) 
force field is a fully vectorized version of AMBER (3.0) with coordinate 
coupling, intra/inter decomposition, and the option to include the 

25 polarization energy as part of the total energy (Weiner et al. (1986), J. 
Comp. Chem. 7:230-252). If necessary, the 3-D structures can be 
dynamically refined, for example, by using a simulated annealing protocol 
(e.g.,, 100 ps equilibration, 500 ps dynamics, up to 1000°K, 1 fs data 
collection). 
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particular indication respond uniformly to the drugs. The drug may not be 
efficacious or side-effects may be pronounced. 

The methods provided herein, represent a further advance in the 
use of rational drug design methods. As described herein, polymorphic 
5 variation has an effect upon the 3-D structure of encoded proteins. As a 
result, drugs interact with variants differently, leading to differential 
responses in the population as a whole. A new approach to drug design 
and testing is provided herein. This methods involves identifying 
polymorphisms and determining 3-D resulting structures, which are then 
10 used in methods, including, computational drug design, in the selection of 
patient populations, in designing treatment protocols and in other 
applications. 

2. Drug resistance 

Methods for understanding and overcoming drug resistances by 
15 using 3-D protein model structures resulting from multiple genetic 
polymorphisms or mutations in an infectious agents, such as viruses, 
bacterial and other pathogenic agents are provided. Also provided are 
methods that for using this information in drug design studies. 

In the case of infectious organisms or other replicating or mutating 
20 agents, such as flu, HIV, rhinovirus or biological warfare agents, some 
polymorphisms or mutations may arise over time which convey resistance 
or susceptibility to specific drug therapy, for example, by altering the drug 
target structure or physical properties so that a specific drug or therapy, 
such as an antibiotic or vaccine, may no longer be able to bind to or 
25 otherwise interact with the target protein to exert its desired biological 
effect. For certain infectious agents, such as HIV, genetic polymorphisms 
in certain genes give rise to drug resistance as the virus mutates (see, 
e.g., Erickson et al. (1996) Annu Rev. Pharmacol. Toxicol. 36:545-571). 
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At block 218, a quality evaluation is performed for all the models. 
As described in connection with the quality evaluation process in Fig. 1, 
the evaluation at block 218 involves evaluating the protein structural 
characteristics, for example, stereochemistry {e.g., phi/psi and side chain 
5 angles), energetics {e.g., strain energy), packing profile {e.g., packing 
factor per residue) and hydrophobic packing, which must meet acceptable 
criteria before the structures are used in further studies or inputted into a 
structural polymorphism database. 

After the model quality is determined, at block 220 the models are 

10 checked to determine if they are satisfactory for further use. If a model is 
not satisfactory, a "No" outcome at block 222, then the problems are 
identified and solved with remedial action at block 224. The remedial 
action may include further iterative energy refinement at block 216 and 
repeated checks of model quality at block 218. Once the models are 

15 satisfactory, a "Yes" outcome at block 226, structure-based drug design 
methods are applied at block 228 to identify potential new drugs that 
bind to the structural variant models. The drug design methods are 
described further below. 

FIG. 3 shows another exemplary and alternative method for 

20 generating structural variant models derived from genetic polymorphisms 
and using them in structure-based drug design studies. The process of 
FIG. 3 is similar to the process of FIG. 2 from the initial process at block 
300 of acquiring patient data for a gene that exhibits genetic 
polymorphisms through the process of obtaining models that are 

25 satisfactory (a "Yes" outcome at block 326). Thus, block numbers in 
FIG. 3 from 300 through 326 that correspond to FIG. 2 blocks numbered 
from 200 thorough 226 refer to similar operations. Unlike FIG. 2, 
however, the process illustrated in FIG. 3 then involves docking 
operations. 
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At block 328, once the models are determined to be satisfactory, 
drug molecules are docked with the structural variant models. Next, at 
block 330, the free energy of binding is evaluated with the potential drugs 
under study for each structural variant model. At block 332, the total 
5 free energy of binding is decomposed, based on the interacting residue in 
the protein active site. Lastly, at block 334, the free energy of binding is 
correlated with patient data, if the data is available. Thus, the 3-D 
structural data is employed in drug design. Details of using such 
structural data in drug design are described further below. 

10 c. Crystal structures 

The crystal structure of any protein can be determined empirically 
and the resulting coordinates used as the basis for determing structures 
of variants. Such structures are often known (see, e.g., Kohlstaedt et al. 
(1992) Science 255:1773-1790 for a crystal structure of HIV-1 RT bound 

15 to a ligand). 

3. Use of 3-D structural variant models in drug design 
The structural differences in protein structural variants that arise 
due to genetic polymorphisms can have profound effects on biological 
activity. Because of the structural differences among the variants, they 

20 may have different physical or reactive properties and therefore may 
exhibit different biological activities. These differences may include, for 
example, different responses to a given drug, so that a drug which works 
well in a patient with one particular genetic polymorphism may not work 
as well in another patient exhibiting a different polymorphism. 

25 The 3-D molecular structures of drug targets derived from genetic 

polymorphisms can be used in structure-based drug design studies to 
greatly advance the development of new pharmaceuticals. Relational 
databases of these 3-D structures that are derived from samplings of 
genetic polymorphisms over a patient population or a cross-section of the 
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population can be used to design potential drugs in order to optimize 
effectiveness for the particular population. 

The structures and databases described herein can provide 
information that is useful, for example, in designing a drug that is 
5 effective in the greatest percentage of the population. It is desirable that 
a given drug is effective in the largest percentage of the population, since 
such a drug is likely to have the greatest clinical utility and thus the 
greatest commercial value. A drug with superior performance properties 
is sometimes referred to as a "best in class" drug and is highly prized by 

10 pharmaceutical companies since this heralds market leadership and the 
likelihood of commercial success. The databases and methods described 
herein can be used to determine 3-D protein structures for drug targets 
that are associated with particular genetic polymorphisms and to use the 
structures in drug design studies for design and optimization of candidate 

15 drugs that exhibit activity over the broadest patient population. 

Genetic polymorphisms may result in target protein structural 
variants in which drug efficacy correlates with specific populations or 
subpopulations. In some cases, it might be desirable to target drug 
design or drug therapy toward a specific patient population, such as a 

20 particular race, gender, or age group, affected by a certain disease or 
condition or toward those having a specific genetic polymorphism. The 
information derived from comparing the 3-D structural variants arising 
from different genetic polymorphisms may be useful for understanding 
why drugs are active or inactive in different subpopulations, or for 

25 assisting in developing new drugs to maximize efficacy across specific 
populations- 
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a. Selection of relevant structural variants 
The structural variant models in the structural polymorphism 
database provided herein can be used to design new drugs or to select a 
drug therapy that would be appropriate for a patient exhibiting a particular 
5 genetic polymorphism. As it may not be possible for a drug to work 
equally well for all polymorphisms, and thus all patients, representative 
structural variants can be selected for use in drug design studies in order 
to maximize biological activity based on genetic polymorphisms. 

In some cases, structural variants are analyzed to determine the 

10 common structural features that are conserved through the selected 
models. These conserved features are used as a basis for drug design. 
In some cases, the structural variant corresponding to the genetic 
polymorphism occurring most commonly in a population can be selected 
for use in identifying drugs that would be effective in the greatest 

15 percentage of the population. Optionally, structural variants 

corresponding to a relevant subpopulation, such as a particular gender, 
age, race, or other characteristic, can be selected for use in designing 
drugs that are active in that subpopulation. In other cases, individual 
structural variant models can be selected for use in designing drugs that 

20 are specifically active against one target in one individual arising from a 
particular genetic polymorphism. Additionally, model structures that 
represent variants derived from patients that receive a specific treatment 
regimen or exhibit a particular clinical response (e.g. drug resistance) to a 
given drug are used as bases for drug design. 

25 The relevant structural variants may be identified using the 

structural analysis tools described herein, optionally in combination with 
database and statistical analysis tools that permit a complete analysis and 
comparison of the molecular structures and properties of the structural 
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variants. The structural variants selected based on the criteria including, 
but not limited to, those listed above are used in drug design. 

b. Drug design 

Once the protein target structural models have been selected, 
5 structure-based drug discovery methodologies, for example, 

computational screening or docking {e.g., DOCK (available from University 
of Ca, San Francisco; and AUTODOCK available from Scripps Research 
Institute, La Jolla and others referenced herein or known to those of skill 
in the art), can then be used to design biologically-active compounds 

10 based on the 3-D structures of the biomolecular receptors. 

Using these methods, drug designers can identify and 
computationally rank various potential clinical drug candidates for 
maximum efficacy, thus cutting the time and expense associated with 
drug discovery. The preferred design of drug candidates or the 

15 modification of existing drugs is based on the intermolecular interactions 
between the drug candidate or modified drugs and the selected structural 
variants predicted by computationally docking drug molecules with the 
target protein models; energetically refining the docked complexes; 
determining the binding interactions between the drug or potential new 

20 drug candidate molecules and the models by calculating the free energy 
of binding of the docked complexes and decomposing the total free 
energy of binding based on interacting residues in the protein active site 
or sites deemed important for protein activity. 

c. Computational docking 

25 Methods for using the structural variant models to design potential 

new drugs or to aid in the selection of a drug therapy based on the 
interactions of selected small molecules with the particular variants are 
provided. Structure-based drug design experiments, such as 
computational screening or docking studies, calculation of binding 
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energies or analysis of steric, electrostatic or hydrophobic properties of 
the resulting structural variant models, can be performed on selected 
structural variant models to aid in the understanding of observed 
biological activities or to determine new potential drug candidates to bind 
5 to the particular target. 

In a typical computational docking protocol, the active site, or sites 
deemed important for protein activity, of the protein model is defined. A 
molecular database, such as the Available Chemicals Directory (ACD) or 
any database of molecules, is screened for molecules that complement 

10 the protein model. Solvation parameters are factored in (see, e.g., 

Shoichet eta/. (1999) PROTEINS: Structure, Function, and Genetics 34:4- 
16). In these computational docking studies, drugs or drug candidates 
are fitted to the structural variant models based on complementary 
interactions {e.g., steric, hydrophobic, or electrostatic interactions). 

15 Methods for performing such studies are well known and software tools 
for performing the calculations are widely available (M. Lambert, "Docking 
Conformational^ Flexible Molecules into Protein Binding Sites" in Practical 
Application of Computer-Aided Drug Design, Charifson, Ed., Marcel 
Dekker, NY, pp. 243-303; Kurtz (1992) Science 257:1078-1082; Kuntz 

20 et al. (1982) J. Mol. Biol. 161:269-288; Stewart eta/. (1992) Med. 

Chem. Res. /:439-443; Shoichet et al. (1993) Science 255:1445-1450; 
Shoichet et al. (1 991) J. Mol. Biol. 227:327-346). 

New potential drug candidates can be designed by identifying 
potential small molecule drugs that can bind to a particular structural 

25 variant. This is accomplished, for example, by methods including, but are 
not limited to, methods for electronic screening of small molecule 
databases as described herein, methods involving modifying the 
functional groups of existing drugs in sifico, methods of de novo ligand 
design. Methods for computationally desiging drugs are known to those 
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of skill in the art and include, but are not limited to, DOCK (Kuntz et a/. 
(1982) "A Geometric Approach to Macromolecule-Ligand Interactions", J. 
Mol. Biol., 161:269-288; available from University of Ca, San Francisco); 
and AUTODOCK (see, Goodsell et at. (1990) "Automated Docking of 
5 Substrates to Proteins by Simulated Annealing", Proteins: Structure, 
Function, and Genetics, 8, pp. 195-202; available from Scripps Research 
Institute, La Jolla); GRID (Oxford University, Oxford, UK); CAVEAT (UC 
Berkeley, Ca), LEGEND (Molecular Simulations, Inc., San Diego, CA); 
LUDI (Molecular Simulations, Inc., San Diego, CA); HOOK (Molecular 

10 Simulations, Inc., San Diego, CA); CLIX (CSIRO, Australia); GROW 
(Upjohn Laboratories, Kalamazoo); others including HINT, LUDI, 
NEWLEAD, HOOK, PRO-LIGAND and CONCERTS (see, M. Murcko, "An 
Introduction to De Novo Ligand Design" in Practical Application of 
Computer-Aided Drug Design, Charifson, Ed., Marcel Dekker, NY, pp 305- 

15 354), methods based on QSAR (quantitative structure-activity 
relationships, QSAR and Drug Design: New Developments and 
Applications, Fugita, Ed., (1995) Elsevier, pp 3-81; 3D QSAR in Drug 
Design, Kubinyi, Ed., (1993) Escom, Leiden), and other methods known 
to those of skill in the art for determining molecules that have optimal 

20 binding interactions with a selected target. 

The docked complexes, if needed, are further refined energetically 
to optimize geometries within the binding site and to select the best 
structure from a set of possible structures, using molecular mechanics, 
molecular dynamics, and simulated annealing techniques, including those 

25 described herein and others that are known to those skilled in the art. 
d. Free energy of binding studies 
After the computational docking step, the free energy of binding of 
the docked complex is calculated, and the total free enegy of binding is 
decomposed based on the interacting residues in the protein active site or 
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sites deemed improtant for protein activity. Analyses of the binding 
energies are needed to identity drug candidates. If need or desired, the 
free energy of binding of different drugs or potential drugs to each 
structural variant model can be calculated by substracting the free energy 
5 of the non-interacting protein and drug from the free energy of the 
protein-drug complex. The total free energy of binding is decomposed 
into its various thermodynamic components, e.g. enthalpic and entropic 
components, based on the interacting residues in the protein active site in 
a solvated model to characterize the structural and thermodynamic 

10 features in the mode of drug binding and to determine the contribution of 
the solvent] (see, e.g., Wang eta/. (1996) J. Am. Chem. Soc. 7/5:995- 
1001; Wang et al. (1995) J. MoL Biol. 253:473-492; Ortiz et al. (1995) 
J. Med. Chem. 35:2681-2691, which describes a computational method 
for deducing QSARs from ligand-macromolecule complexes). Following 

15 the computational drug design protocol described herein, any potential 
new drugs that are identified can be synthesized in, for example, industry 
or academia, and subjected to further biological testing, such as in vitro 
studies or pre-clinical and clinical in vivo testing. 

Based on the predicted intermolecular interactions of the drugs or 

20 modified drugs with the structural variant models from binding studies, 
potential drug candidates that are specific for a protein with a selected 
polymorphism or that specifically interact with all proteins exhibiting the 
polymorphism can be identified. 

It is also possible to individualize drug design or drug therapy by 

25 determining the structural variants associated with a particular patient and 
then designing or screening drugs or potential drugs to maximize efficacy 
in that subject or in a subpopulation that exhibits the same genetic 
polymorphism. The variants may also be used to track polymorphic 
variations in infectious organisms, such as viruses. For example, the 
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human immunodeficiency viruses (HIVs) reverse transcriptase and 
protease have served as drug targets (see, Erickson et at. (1996) Ann. 
Rev. Pharmacol. Toxicol 36:545-571); their three-dimensional structures 
are known (see, e.g., Nanni et at. (1993) Perspectives in Drug Discovery 
5 and Design 7:129-150; Kroeger et at. (1997) Protein Eng. 70:1379- 

1383). The clinical emergence of drug-resistant variants of these viruses 
has limited the long-term effectiveness of drugs targeted against these 
enzymes. 

As noted, these enzymatic proteins in order to preserve function 

10 must exhibit conserved 3-D structures. The methods herein permit design 
of drugs specific for the conserved regions of the 3-D structures. They 
also permit selection of drug regimens based upon the alleles expressed. 
Hence, methods for designing HIV enzyme-specific drugs are provided. 
Flow charts illustrating exemplary alternative embodiments using protein 

15 3-D structures derived from genetic polymorphisms in structure-based 
drug design studies are provided (see, Figs. 2 and 3). In the flow charts 
depicted in these figures, the drug design includes structure-based drug 
design methods (see, Figure 2) and computational docking of drugs with 
structural variants, evaluation of the binding energy of the docked 

20 complexes, and correlation of the binding energy with patient data such 
as age, gender, race, drug treatment history, and any other pertinent 
information that is available (see, Figure 3). The data generated by this 
computer-based method can be stored in a database, such as, for 
example, in a relational database. The resulting database can be screened 

25 using searching tools to select potential drugs and therapeutic agents that 
bind to or exhibit biological responses towards target proteins. 



WO 01/35316 



PCT/US00/30863 



-44- 

C. Applications of computer-based methods 

As discussed above, the computer-based methods provided herein 
include some or all of the steps of obtaining one or more, preferably two 
or more, amino acid sequences of a target protein that is the product of a 
5 gene exhibiting genetic polymorphisms; generating 3-dimensional (3-D) 
protein structural variant models from the sequences; and based upon the 
structures of the 3-D models, designing drug candidates or modifying 
existing drugs based on the predicted intermolecular interactions of the 
drug candidates or modified drugs with the structural variants by 

10 computationally docking drug molecules with the target protein models; 
energetically refining the docked complexes; determining the binding 
interactions between the drug or potential new drug candidate molecules 
and the models by calculating the free energy of binding of the docked 
complexes and decomposing the total free energy of binding based on 

15 interacting residues in the protein active site or sites deemed important 
for protein activity. There are numerous applications of these methods, 
which include structure-based drug design and drug testing; selection of 
clinically relevant populations for drug testing and other such methods. 
1 . Genetic polymorphisms and structure-based drug design 

20 As noted above, structure-based drug design is an increasingly 

useful methodology that has made a great impact in the design of 
biologically active lead compounds. Drug designers can design and 
screen potential new drugs via computational methods, such as docking 
or binding studies, before actually beginning patient testing. The drugs 

25 designed by such methods, and also those identified by traditional 

methods of drug discovery, are then tested in clinical trials. Among those 
that show efficacy for a particular indication and low toxicity ultimately 
are approved for use. It is found, however, that not all patients with a 
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Where drug resistance that arises from mutations or polymorphisms 
is observed, the methods described herein can be used to develop new 
drugs that overcome the resistance. For example, once drug resistance is 
observed, the structure associated with the resistant polymorphism can 
5 be determined and used in further drug design studies to suggest new 
drugs or modifications to the existing drug that will restore biological 
activity by targeting different mutants or that will target multiple mutants 
simultaneously. 

The model structures can also be used to correlate drug resistance 

10 in infectious diseases with the structural variants derived from genetic 
polymorphisms. Here, the 3-D structure of the virus or other drug target 
is determined for the particular variant model against which the drug was 
effective. When drug resistance arises due to a genetic polymorphism, a 
model for the structure variant associated with the resistant organism can 

15 be generated, and a new drug can be designed or modifications can be 
made to the existing drug to overcome the resistance. 

For example, samples of the mutating organism can be obtained 
over time and structural models for the resulting proteins can be 
generated. These models can then be used to design new drug therapies 

20 that are active against the mutated organism. Multiple drug resistant 
structures can be analyzed to obtain an average structure or to identify 
common structural features in order to design new drugs that have the 
broadest spectrum of activity against multiple mutations. 

Such structural information is useful in designing effective drug 

25 therapies to overcome resistance or to develop drugs that are effective 
over a range of genetic polymorphisms and thus work for the maximum 
number of patients. 
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3. Identification of conserved structural features or 
pharmacophores 

If common structural features are observed over a range of protein 
targets that are derived from genetic polymorphisms, these common 
5 features may be used to design a drug that is effective with a variety of 
genetic polymorphisms and thus many patients. The retention of certain 
common structural features over a large number of genetic 
polymorphisms suggests that those features may not be mutatable 
because the conserved structure may be essential to protein function, 

10 e.g., to the viability of an infectious organism or virus. Such conserved 
structural elements are prime targets for structure-based drug design, 
e.g., anti-infective or antibiotic drug design, and can lead to highly 
effective therapies. 

The common structural features can serve as a basis for structure- 

15 based drug design, for example, by serving as a scaffold for building a 
receptor model into which potential drug candidates can be docked or as 
a pharmacophore query for screening a library of physical or virtual 
chemical or biochemical molecules to identify compounds that match the 
pharmacophore template and, thus, are potential drug candidates. 

20 Analysis of 3-D protein structural variants derived from genetic 

polymorphisms to identify the common structural features over a large 
number of structural variants can aid in the design of drugs that are active 
over a broad range of genetic polymorphisms, such as in a large number 
of patients or against drug resistant targets. 

25 In comparing sets of related protein structures, such as those with 

the same biological function or those resulting from genetic 
polymorphisms, certain parts of the structural framework are often found 
to be conserved, while other parts vary among the proteins. Mutations 
that occur in the conserved regions of the structure can have significant 

30 effects biological activity. For example, in viruses, the conserved features 
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can be essential to protein function and, thus, to the viability of the 
infectious organism or virus. Identifying the conserved structural features 
over a range of structures often gives insight into which structural 
features are necessary for biological activity and are therefore non- 
5 mutatable. By analyzing a number of structural variants derived from 
genetic polymorphisms that exhibit drug resistance, it is possible to 
identify or design drugs that interact best with the common structural 
features in all of the variants. Using these features in structure-based 
drug design studies leads to the identification of drugs that retain 

10 biological activity despite multiple mutations, or polymorphisms, and 
could help to overcome the problem of drug resistance. 

In certain preferred embodiments, new potential drug candidates 
can be identified using the structural variant models by identifying 
pharmacophores or conserved features in the protein structural variant 

15 models and using this structural information to identify small molecules 
that would bind to the structural variant models. 

Using structural comparison tools described herein, the common 
structural features that are conserved across a range of structural variant 
models of a given protein based on different genetic polymorphisms can 

20 be identified. To do this, multiple structural variant models are compared, 
generally by superimposing the coordinates of one variant model onto 
those of one or more other variants and observing the structural fit. Such 
functionality is commonly found in molecular graphics or homology 
modeling packages. Once the optimum fit of structures is performed, 

25 then the structural features that are present throughout the structural 
variant models can be identified and used as the basis for drug 
interactions in structure-based drug design studies. For example, the 
pharmacophores or conserved features can be specified as database 
queries and a library or database of small molecule structures can be 
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searched to identify new lead compounds to bind to the pharmacophores. 
Alternatively, other structure-based ligand design strategies can be 
employed to design lead compounds or to identify modifications to be 
made to existing drugs to improve biological activity. 
5 4. Identification of compensatory structural changes 

Certain proteins, for example, viral proteins or other infectious 
organisms, may harbor multiple genetic polymorphisms. Since each 
genetic polymorphism can give rise to slight changes in structure, some, 
and over time, many, additional genetic polymorphisms may cause 

10 changes in the protein structures that significantly affect biological 

activity. These structural changes could result in, for example, different 
dynamical behavior, alteration in enzyme kinetics or differences in 
substrate recognition, which can significantly alter drug response. For 
example, a mutation for one drug compound can suppress a mutation to a 

15 second drug due to compensatory effects. In these cases, a drug which 
is predicted to be ineffective for a given patient based upon the single 
nucleotide correlation may, in fact, be effective as a result of these 
changes. 

Because mutations are so frequent in AIDS and other viruses, few 
20 sequences are exactly the same in different patients. Thus, it is difficult 
or inconclusive to generate multiple mutation sequence correlations for 
drug resistance. If each patient has a different viral sequence due to a 
high viral mutation rate, then no sequence correlation is even possible in 
such cases. 

25 The methods described herein can be used to study the effects of 

multiple genetic polymorphisms on a resultant protein structure. Multiple 
mutations are common in AIDS and other viruses, which makes sequence 
correlation difficult. By observing the structural effects of the mutations 
on the resulting protein, it is possible to look at the net effect of all 
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structural changes and to consider the overall structure of the protein in 
drug design studies. For example, a mutation might occur in the active 
site, or site of drug action, in a protein. Additionally, there may be related 
mutations in other parts of the protein structure, which might not be 
5 identified from a single point mutation correlation. These related 

mutations could have an effect on biological activity of the protein. By 
looking only at the active site, it might be predicted that a drug or 
potential drug would not bind to the plrotein. The additional mutation, 
however, might cause compensatory structural changes in the protein 

10 structure that alter its properties in a way that restores biological activity. 
By computing 3-D protein structures from gene sequences 
containing multiple polymorphisms, it is possible to more accurately 
predict the effect of multiple sequence mutations on protein structure 
and, thus, to obtain a better correlation between sequence and drug 

15 resistance than by considering sequence correlations alone. This 

information can be useful, for example, in understanding drug resistance 
and can aid researchers and clinicians in developing new drug therapies to 
overcome drug resistance. 

The structures that are derived based on multiple generic 

20 polymorphisms can be used in structure-based drug design studies to 
provide frameworks, or scaffolds, into which drug or potential drug 
molecules can be docked. This permits the design of drugs that are 
active against a wider range of structural variants, thus, in more patients 
or against a range of drug resistant proteins. 

25 5. Clinical Applications 

A knowledge of the repertoire of structural differences arising from 
genetic polymorphisms across the human population or specific 
subpopulations can provide insight into the differing biological responses 
in patients based on their genetic differences. For example, where clinical 
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data are available for patients having particular genetic polymorphisms, 
this information can be associated with the 3-D protein structural variants 
and used to find correlations between polymorphisms and observed drug 
responses. 

5 The methods provided herein can be used to design drug therapies 

that bring about favorable clinical responses (or eliminate unfavorable 
effects) in patients, to identify pharmacological effects of drugs in 
different patient subpopulations (e.g. age, race, gender) and to simulate 
clinical trails to increase the probability that the trials will yield optimal 
10 results. 

Because of the high cost of clinical trials, such studies are generally 
focused on small patient populations. The structural analysis tools 
described herein permit the extension of clinical trials to cover patient 
populations not specifically included in the study. This is accomplished 

15 through correlation of the structural variants derived from genetic 
polymorphisms with clinical responses. 

The molecular structures and databases described herein can also 
find application in the understanding and prediction of clinical or 
pharmacological drug responses, for example, efficacy, toxicity, dose 

20 dependencies or side effects in patients. For example, relational 

databases containing 3-D protein structural variants can provide a means 
for managing and using the information to understand and predict clinical 
responses in patients. 

In other embodiments, observed clinical data from patients in a 

25 clinical trial can be associated with the structural variant models for each 
genetic polymorphism exhibited in the clinical subjects, for example, in a 
structural polymorphism relational database. The correlation between the 
structural variants and observed clinical effects can then be utilized to 
predict clinical outcomes in patients that did not participate in the clinical 
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trial. For example, a structural variant model can be generated for a 
patient based on a genetic polymorphism exhibited in the patient, and the 
database can be mined to identify structurally similar variants for which 
clinical results are known. Structural similarity can be determined, for 
5 example, by superimposing the structures and measuring the RMS (root 
mean squared) differences between the structures or by using pattern 
matching or motif searching algorithms. The results can be used to 
predict clinical responses in the patient based on the clinical data 
associated with the structurally similar variants. 

10 The predicted correlations can also be used to aid in the design of 

subsequent clinical trials. The follow-on trials can be made more effective 
through the judicious selection of patients with given genotypes (Le^, 
those exhibiting the same genetic polymorphisms), as guided by the 
structurally predicted outcomes. For example, a clinical trial can be 

15 designed based on a subpopulation of clinical subjects which exhibit a 
specific genetic polymorphism (Le. structural variant) to demonstrate the 
effectiveness of a given therapeutic on a targeted population. 

In other embodiments, the methods provided herein can be used in 
the selection of drug therapies for patients exhibiting a particular genetic 

20 polymorphism. This is accomplished by generating the structural variant 
model associated with the polymorphism, docking drug molecules that 
might be used to treat the patient into the structural variant model and 
calculating the binding energies of each drug with the variant. The results 
of docking or free energy calculations can be correlated to clinical data, 

25 for example, patient population (e.g., ethnic background, race, sex, age), 
treatment regimen, patient response to a particular drug or duration of 
treatment. The binding energies can be compared, for example, to 
determine which drug would best bind to the variant in order to identify 
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the drug that could best be used to treat the patient to optimize biological 
activity. 

D. Creation of 3-D Structural Polymorphism Databases 

The above-noted methods all rely upon the use of databases of 
5 nucleic acid sequences. Any such database known to those of skill in the 
art may be employed; numerous such databases are publically available 
(e.g. the Stanford HIV database). The Stanford HIV database is hierarchal 
database with information about HIV patients who received or did not 
receive protease inhibitor treatments, patient-dates, isolates, sequences, 

10 hyperlinks to MEDLINE and GenBank abstracts, and art. This database, 
however, does not contain 3-D protein structures of any proteins 
including HIV reverse transcriptase (RT) and HIV protease (PR; see, e.g., 
Shafer et al. (1999) Nucleic Acids Res. 27:348-352, Shafer eta/. (1999) 
J. Virol 75:6197-6202, http://hivdb.stanford.edu/hiv, Richter (January 

15 20, 1999) "AIDS drugs found to be effective in the world's most common 
HIV strains). 

Databases of sequences and associated information may also be 
generated as described herein by obtaining samples and sequences from a 
variety of sources. In all instances, further databases are generated by 

20 then calulating 3-D structural models of the encoded proteins or relevant 
portions, such as active binding sites, thereof, from the nucleic acid 
sequence information. It is these databases of nucleic acid sequence 
and/or primary protein sequence and the associated 3-D structure that are 
provided herein and that are used in the all of the methods, except for the 

25 computational phenotyping discussed below, which does not require a 
database, provided herein. Hence databases comtaining computationally 
determined 3-D structures of polymorphic proteins or portions thereof are 
provided herein. These databases serve as tools in a variety of methods, 
including those provided herein. 
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Databases that include 3-D structures for variant proteins encoded 
by the nucleic acids that contain polymorphisms are provided. These are 
generated after 3-D structural models are constructed for the protein 
structural variants, preferably for all of the protein structural variants, 
5 representing the genetic polymorphisms, by inputting the atomic 
coordinates into a structural polymorphism database, preferably a 
relational database, and optionally with associated structural and/or 
physical properties {e.g., phi/psi and side-chain angles and energetics), 
and other data, if available, including, but are not limited to, historical 

10 data, such as parental medical histories, and clinical data. The resulting 
database is used in structure-based drug design studies and for clinical 
analyses. Figure 1 1 is a tabulation of the 3-D coordinates of a 
representative entry, an HIV protease, that is encoded by the DNA in one 
of SEQ ID Nos. 3-74 and 77-1 17, and that is an entry in an exemplary 

15 database that includes 3-D structures. Exemplary databases that contain 
the nucleic acids sequences and structures of all proteins encoded by 
SEQ ID Nos. 3-117 as well additional nucleic acids are provided herein 
and are described in the EXAMPLES. 

A database is preferably interfaced to a molecular graphics package 

20 that includes 3-D visualization and structural analysis tools, to analyze 
similarities and variations in the protein structural variant models (see, 
copending U.S. application Serial No. 09/531,995, which is published as 
International PCT application No. WO 00/57309, and is a continuation-in- 
part of U.S. application Serial No. 09/272,814, filed March 19, 1999). 

25 Briefly, International PCT application No. WO 00/57309 provides a 
database and interface for access to 3-D molecular structures and 
associated properties, which can be used to facilitate the design of 
potential new therapeutics. The interface also provides access to other 
structure-based drug discovery tools and to other databases, such as 
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databases of chemical structures, including fine chemical or combinatorial 
libraries, for use in structure-focused high-throughput screening, as well 
as to a host of public domain databases and bioinformatics sites. The 
interface also provides access to other structure-based drug discovery 
5 tools and to other databases, such as databases of chemical structures, 
including fine chemical or combinatorial libraries, for use in structure- 
focused high-throughput screening, as well as to a host of public domain 
databases and bioinformatics sites. This interface can be modified as 
needed to adapt for use with a paritcular database. 

10 A relational database that collects multiple data files relating to the 

same molecular structure in the same subdirectory and that provides an 
interface to access all of the collected files from the same structure using 
the same user interface program is also provided. The collected files 
include a variety of information and computer file formats, depending on 

15 the type of information to be conveyed to users of the database. In 
practice, a user communicates over a public network, such as the 
Internet, or over a controlled network, such as an internet, with a secure 
file server that controls access to the collected files, and the interface to 
the collected files is provided by a standard graphical user interface 

20 program that is widely available. In this way, a convenient means of 
searching molecular structure data for characteristics of interest is 
provided. Data searching, file viewing, and investigation of multiple 
representations of molecular structures from within a single viewing 
program can also be performed using the database and interface. 

25 The data files can be those available over a wide network such as 

the Internet, and a suitable graphical user interface designed or obtained. 
Such interface is used for viewing the data files is a standard Internet 
web browser program, such as the web browser products by Netscape 
Communications, Inc. and Microsoft Corporation that are distributed free 
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of charge. Such browser products readily import and provide views of 
files having a wide variety of formats that contain alphanumeric, video, 
and audio data. A security server is preferably located between the user 
browser program at a network client machine controls access to the 
5 database, which is housed at a file server connected to the security 
server. Before a user gains access to the database, the security server 
checks authorization for the individual user and then, if appropriate, 
permits downloading of appropriate data from the database file server. It 
is contemplated that the databases containing 3-D structures of proteins 

10 or portions thereof the exhibit polymorphism will be loaded. 

Data for a molecular structure is loaded into the database by 
specifying the file pathnames for the various data files that contain the 
different types of data, including the different molecule views. Using a 
browser to view the data files permits various helper applications, called 

15 plug-ins, to smoothly and transparently accept the different file formats 
and provide views to the user. The various data files of the database are 
organized in accordance with the database design when they are loaded 
into the database and are managed by a relational database management 
program. 

20 In addition to 3-D protein structures and associate primary 

sequences, as provided herein, the database can optionally contain 
associated biological or clinical data, such as drug resistance, side 
effects, efficacy, pharmacokinetics and other data, that correlate with or 
can be correlated the structural variants. This information will be used for 

25 correlating observed clinical effects to specific structural variants and for 
predicting clinical responses and outcomes based on a patient's structural 
variants, i.e., genetic polymorphisms. 
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Structural analysis tools are preferably integrated with the 
structural database for comparing and analyzing the resulting protein 
structural variant models. For example, the molecular graphics software 
package described in International PCT application No. WO 00/57309, 
5 includes structural analysis capability to measure the structural attributes 
of the model (distances, angles, etc.), to analyze sequences and 
secondary structures, to study physical properties such as 
hydrophobicity, electrostatic potential, and active or reactive sites in the 
protein, as well as to evaluate the quality of the structure (both 

10 conformational^ and energetically). 

Structures can also be compared by aligning them, such as by 
performing a least squares fitting of the x-, y- and z-coordinates of each 
of the structural variant models and superimposing the structures or any 
other alignment method or structural comparison method. For example, 

15 the structures of the variants can be clustered, or grouped together, 
based on structural similarity. This can save time over studying each 
structural variant independently because, where structures are considered 
to be similar enough that they are clustered together (e.g., if their 
structures can be superimposed within a specified tolerance), then only a 

20 representative structure, or perhaps an average structure or scaffold, 
which is derived as a composite of the individual structural variant 
models, can be used in further drug design studies. 

Tools for database searching can also be included in the software 
package. These can be used to query the database for structural variant 

25 models having similar properties, such as molecular structure or sequence 
similarity. These tools are used, for example, to mine the database to 
identify variant models that are structurally similar [e.g. to find structures 
that overlap within a specified tolerance), and thus would be predicted to 
interact in the same way with potential drugs or exhibit the same clinical 
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response. This information could be useful in understanding the 
structural or clinical effects of different genetic polymorphisms and could 
potentially save time and money by extending the results of previously 
performed clinical or computer-based drug design studies to predict the 
5 results of studies on similar structural variants that have not yet been 
performed. 

1 . Exemplary Databases 

Databases containing data representative of the 3-D structure of 
structural variants encoded by a selected gene or genes or the 3-D 

10 structure of other polymorphic variants are provided. The selected genes 
can be drug target, such as receptors and genes of infectious agents, 
such as the HIV protease or reverse transcriptase. Exemplary databases 
are presented in Example 5 which describes the construction, interface, 
use and appliations of HIV PR and RT databases. These databases may 

15 be stored on any suitable medium and used in any suitable computer 
system. Systems and methods for generating, storing and processing 
databases are well known. 

2. Computer systems 

Computer systems for processing the databases and computer 
20 systems containing the databases are provided. The processing that 
maintains the database and performs the methods and procedures using 
the databases may be performed on multiple computers, or may be 
performed by a single, integrated computer. For example, the computer 
through which data is added to the database may be separate from the 
25 computer through which the database is sorted or analyzed, or may be 
integrated with it. Each computer operates under control of a central 
processor unit (CPU), such as a "Pentium" microprocessor and associated 
integrated circuit chips, available from Intel Corporation of Santa Clara, 
California, USA. A computer user can input commands and data from a 
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keyboard and display mouse and can view inputs and computer output at 
a display. The display is typically a video monitor or flat panel display 
device. The computer also includes a direct access storage device 
(DASD), such as a fixed hard disk drive. The memory typically includes 
5 volatile semiconductor random access memory (RAM). Each computer 
preferably includes a program product reader that accepts a program 
product storage device from which the program product reader can read 
data (and to which it can optionally write data). The program product 
reader can include, for example, a disk drive, and the program product 
10 storage device can comprise removable storage media such as a magnetic 
floppy disk, an optical CD-ROM disc, a CD-R disc, a CD-RW disc, or a 
DVD data disc. If desired, computers can be connected so they can 
communicate with each other, and with other connected computers, over 
a network. Each computer can communicate with the other connected 
15 computers over the network through a network interface (see, e.g., 

Examples below) that permits communication over a connection between 
the network and the computer. 

The computer operates under control of programming steps that 
are temporarily stored in the memory in accordance with conventional 
20 computer construction. When the programming steps are executed by 
the CPU, the pertinent system components perform their respective 
functions. Thus, the programming steps implement the functionality of 
the system as described above. The programming steps can be received 
from the DASD, through the program product reader, or through the 
25 network connection. The storage drive can receive a program product, 
read programming steps recorded thereon, and transfer the programming 
steps into the memory for execution by the CPU. As noted above, the 
program product storage device can include any one of multiple 
removable media having recorded computer-readable instructions, 
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including magnetic floppy disks and CD-ROM storage discs. Other 
suitable program product storage devices can include magnetic tape and 
semiconductor memory chips. In this way, the processing steps 
necessary for operation can be embodied on a program product. 
5 Alternatively, the program steps can be received into the operating 

memory over the network. In the network method, the computer receives 
data including program steps into the memory through the network 
interface after network communication has been established over the 
network connection by well known methods that will be understood by 

10 those skilled in the art without further explanation. 

The computer that implements the client side processing, and the 
computer that implements the server side processing, or any other 
computer device of the system, may comprise any conventional computer 
suitable for implementing the functionality described herein. FIGURE 9 is 

15 a block diagram of an exemplary computer device 900 such as might 
comprise any of the computing devices in the system. Each computer 
operates under control of a central processor unit (CPU) 902, such as an 
application specific integrated circuit (ASIC) from a number of vendors, or a 
n Pentium M -class microprocessor and associated integrated circuit chips, 

20 available from Intel Corporation of Santa Clara, California, USA. Commands 
and data can be input from a user control panel, remote control device, or a 
keyboard and mouse combination 904 and inputs and output can be viewed 
at a display 906. The display is typically a video monitor or flat panel display 
device. 

25 The computer device 900 may comprise a personal computer or, in 

the case of a client machine, the computer device may comprise a Web 
appliance or other suitable Web-enabled device for viewing Web pages. In 
the case of a personal computer, the device 900 preferably includes a direct 
access storage device (DASD) 908, such as a fixed hard disk drive (HDD). 
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The memory 910 typically comprises volatile semiconductor random access 
memory (RAM). If the computer device 900 is a personal computer, it 
preferably includes a program product reader 912 that accepts a program 
product storage device 914, from which the program product reader can 
5 read data (and to which it can optionally write data). The program product 
reader can comprise, for example, a disk drive, and the program product 
storage device can comprise removable storage media such as a floppy disk, 
an optical CD-ROM disc, a CD-R disc, a CD-RW disc, a DVD disk, or the like. 
Semiconductor memory devices for data storage and corresponding readers 

10 may also be used. The computer device 900 can communicate with the 
other connected computers over a network 916 (such as the Internet) 
through a network interface 918 that enables communication over a 
connection 920 between the network and the computer device. 

The CPU 902 operates under control of programming steps that are 

15 temporarily stored in the memory 910 of the computer 900. When the 

programming steps are executed, the pertinent system component performs 
its functions. Thus, the programming steps implement the functionality of 
the system illustrated in FIGURE 1 . The programming steps can be received 
from the DASD 908, through the program product 914, or through the 

20 network connection 920, or can be incorporated into an ASIC as part of the 
production process for the computer device. If the computer device includes 
a storage drive 912, then it can receive a program product, read 
programming steps recorded thereon, and transfer the programming steps 
into the memory 910 for execution by the CPU 902. As noted above, the 

25 program product storage device can comprise any one of multiple removable 
media having recorded computer-readable instructions, including magnetic 
floppy disks, CD-ROM, and DVD storage discs. Other suitable program 
product storage devices can include magnetic tape and semiconductor 
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memory chips. In this way, the processing steps necessary for operation in 
accord with the methods herein can be embodied on a program product. 

Alternatively, the program steps can be received into the operating 
memory 910 over the network 916. In the network method, the computer 
5 receives data including program steps into the memory 910 through the 
network interface 918 after network communication has been established 
over the network connection 920 by well-known methods that will be 
understood by those skilled in the art without further explanation. The 
program steps are then executed by the CPU 902 to implement the 

10 processing of the system. 

To implement the functionality described herein, it has been found 
that a suitable computer for performing database server tasks includes a 
"Pentium" level CPU having at least 128 MB of memory, 30 GB of disk 
storage, and 256 MB of disk swap space for files. A recommended 

15 configuration for computer performance would include, for example, a 
"Pentium III" processor at 700 MHz or faster, memory of 256 MB or 
greater, disk storage space of 50 GB or more, and swap space of 500 MB 
or more. A suitable configuration for performing user tasks as described 
above includes a "Pentium" level CPU having 128 MB memory, disk 

20 space of 240 MB with swap space of 256 MB, and an optional display 
circuit card supporting OpenGL and having 4 MB of memory. A 
recommended configuration includes, for example, a "Pentium III" 
processor at 500 MHz or faster, memory of 256 MB or greater, disk 
space of 500 MB or more, swap space of 500 MB or more, and an 

25 optional display card having 8 MB of memory or more, supporting 
resolution of 1024 x 768. 

In a preferred embodiment, the software used in the computing 
system described above includes, for the server machine, operating 
system software such as "Windows NT Server 4.0" from Microsoft 



WO 01/35316 



PCT/US00/30863 



-63- 

Corporation, with Service Pack 5, Version 1280 (10 June 1999) or more 
recent, with database management server software such as, but are not 
limited to, "Oracle Server Standard Edition 8.1" from Oracle Corporation. 
The software used in a preferred embodiment of the user machine 
5 includes operating system software such as "Windows NT Workstation 
4.0" from Microsoft Corporation, with Service Pack 5, version 1280 (10 
June 1999) or more recent, as well as "Oracle Client Standard Edition 
Version 8.1" or higher. The client machine will also be compliant with 
the "Java" programming language {Java Runtime Environment 1.2.2). As 

10 will be known to those skilled in the art, other configurations may be 
suitable, depending on the applications being used and the computer 
performance desired. 
E. Computational phenotyping 

Also provided herein is a method designated computational 

15 phenotyping. Computational (also referred to herein as in silico 

phenotyping). This refers to the method in which a 3-D protein structure 
is generated from a given genotype and protein-drug binding analyses in 
silico (computationally) are performed in order to determine whether drug 
binding does (i.e. sensitive) or does not (i.e. resistant) take place. This 

20 type of analysis is contemplated to be performed for an individual patient 
or subject or groups thereof, such as ethnic groups, gender-based or age- 
based groups, particular species or groups thereof) to assess or select a 
drug for treatment of a particular disease or other such use, and is done 
to assess efficacy of a particular drug on a desired target, where the 

25 target exhibits polymorphisms. The following discussion and example, 
below, is with reference to HIV PR and RT, but it is understood that the 
methods and applications can be applied to any protein or gene product 
that exhibits polymorphic variation, and particularly to gene products that 
are drug targets. 



WO 01/35316 



PCT/USOO/30863 



-64- 

Among the methods of computational phenotyping, there are three 
distinct methodologies that are clinically useful for determining either 
resistance or sensitivity to particular HIV-1 antiviral therapeutics. These 
are: genotyping, phenotyping, and virtual phenotyping. These 
5 methodologies are used to optimize the choice of therapeutics during the 
initiation of therapy, after drug failure, and/or during salvage therapy. 
Genotyping involves extracting the HIV viral RNA and amplifying all or 
part of the genes encoding the protease and reverse transcriptase 
proteins and sequencing them in order to assess the presence of 

10 resistance-associated mutations. 

In phenotyping, the amplified sequences are instead sub-cloned into 
expression vectors and then tested for their replicative ability in vitro by 
transfecting them into cultured and/or established cell lines, such as, for 
example, human T cells, monocytes, macrophage, dendritic cells, 

15 Langerhans cells, hematopoeitic stem cells, HeLa, XC, Mm5MT, LTL, 
COS 7, NIH3T3, LTA, MCF-7, or other cells derived from human tissues 
and cells that which are the principal targets of viral infection in the 
presence or absence of antiviral drugs (see, e.g., U.S. Patent No. 
5,837,464; see, also EP 0852626; EP 1012334; and EP 0877937), 

20 Virtual phenotyping (ViroLogic, Inc.) is an interpretive service in which the 
phenotype of a specimen (i.e. of a plant, animal, pathogen, or human) is 
inferred from the specimen's genotype based upon an extensive 
correlative database of known genotypes and phenotypes. Such a 
correlative database must be updated constantly to maintain clinical 

25 accuracy. 

Similar to virtual phenotyping, computational or in silico 
phenotyping infers phenotype based upon specimen genotype. Computa- 
tional phenotyping is distinct from virtual phenotyping in that sensitivity 
or resistance to drugs is determined directly through protein-drug binding 



WO 01/35316 



PCT/US00/30863 



-65- 

analysis performed in silico and not through correlation with a database of 
known genotypes and phenotypes. The advantage of computational 
phenotyping is that new resistance conferring mutations can be 
discovered rapidly and in "real time" without the need for phenotyping to 
5 train the genotype. Moreover, in silico phenotypes are not subject to 
error caused from compensatory mutations which may act synergistically 
or anti-synergistically with resistance-associated mutations to increase, 
decrease, or reverse specific drug resistances. Computational 
phenotyping will generate information that can, for example, be presented 

10 in a report that is marketed within the in vitro diagnostics industry as an 
adjunct test/service to help optimize therapy and assist physicians, 
farmers, acadmenic institutions, government agencies, and industries with 
specimen treatment. Thus, a computer-based method for predicting 
clinical responses e.g. drug sensitivity or drug resistance in patients, 

15 plants, animals, pathogens, and microorganisms based on genetic 
polymorphisms is provided. 

The genotypes used in the methods are obtained from any source, 
including, but are not limited to, from a plant, animal, pathogen, or 
mammal with the most preferred source being a mammal, paticularly a 

20 human for whom a particular drug treatment is contemplated, and is the 
genotype of the drug target, such as, as exemplified herein, HIV RT or PR 
from a particular infected individual. Other examplary drug targets are 
proteins, polypeptides, oligopeptides, including, but not limited to, a 
receptor, enzyme, hormone, and any such compound with which drugs or 

25 other ligands interact to bring about a biological response. For 

exemplification of this method, the protein considered is an enzyme, in 
particular HIV protease (PR) and reverse transcriptase (RT), which are 
therapeutic drug targets. Nucleic acid encoding the target from 
individual sample, such as blood sample or other body fluid sample from a 
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mammal, such as a human patient, is sequenced, and the 3-D structure 
thereof determined. The drug of interest is computationally tested to 
assess whether it interacts with the sample. 

5 The following examples are included for illustrative purposes only 

and are not intended to limit the scope of the invention. 

EXAMPLE 1 

BINDING CORRELATIONS OF MUTANT FORMS OF HCV PROTEASE 
WITH DIFFERENT INHIBITORS 

10 This example provides the results of a theoretical study of NS3 

protease complexes with two known peptide inhibitors (see SEQ ID Nos. 

1 and 2; Ingallinella et al. ((1998) Biochemistry 37:8906-8914). 

Introduction 

During HCV replication, the final steps of processing are performed 

15 by a virially encoded chymotrypsin-like serine protease NS3. NS3 is an 
approximately 3000 amino acid protein that contains, from the amino 
terminus to the carboxy terminus, a nucleocapsid protein (C), envelope 
proteins (E1 and E2) and several non-structural proteins (NS1, 2, 3, 4a, 
4b, 5a and 5b). NS3 is an approximately 68 kDa protein, encoded by 

20 approximately 1 893 nucleotides of the HCV genome, and has two distinct 
domains: (a) a serine protease domain containing approximately 200 of 
the N-terminal amino acids; and (b) an RNA-dependent ATPase domain at 
the C-terminus of the protein. The NS3 protease is considered a member 
of the chymotrypsin family and is a serine protease that is responsible for 

25 proteolysis of the polypeptide (polyprotein) at the NS3/NS4a, NS4a/NS4b, 
NS4b/NS5a and NS5a/NS5b junctions responsible for generating four viral 
proteins during viral replication. This protease is inhibited by N-terminal 
cleavage products of substrate peptides. The NS3 protease, which is 
necessary for polypeptide processing and viral replication has been 

30 identified, cloned and expressed (see, e.g., U.S. Patent No. 5,712,145). 
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Active NS3 forms a heterodimer with a polypeptide cofactor NS4A. 
The crystal structure of NS3 with and without the NS4A cofactor is 
known (see, e.g., Love et at. (1996) Cell 57:331-342; Habuka et at. 

(1997) Jikken Igaku 75:2308-231 3; Yan et al. (1998) Protein Sci. 
5 7:837-847, which provides the structure with NS4A). 

The NS3 protease is a target for design of antiviral drugs. For 
example, a series of potent hexapeptide inhibitors of NS3 has been 
developed by optimization of the product inhibitors (Ingallinella et al. 

(1998) Biochemistry 37:8906-8914). 

10 

Analyses 

Models of the complexes of NS3 with the two protease inhibitor 
peptides were obtained by flexible docking of the peptides into the active 
site of the crystal structure of NS3/4A, followed by evaluation of protein- 
15 peptide binding energies. The models were tested by in situ modification 
of the docked ligands. A qualitative agreement between the binding 
energies and inhibitor IC 50 values obtained from literature was found. 
The peptides studied were: 

Sequence* IC 50 , nM SEQ ID 

20 Ac-Asp^D-Glu^Leu^lle^Cha^Cys^COO- 15 1 

Ac-Asp'-L-GIu^Leu^lle^Cha^Cys^COO- 60 2 

* Cha = /?-cyclohexylalanine 

In the modeling studies, it was assumed that: 
25 the high-affinity inhibitory peptides 1 and 2 have a similar mode of 

binding to the active site of NS3; 

the minimum binding pharmacophore includes the SH group of Cys 6 
and carboxyl groups of Asp 1 , Glu 2 and Cys 6 ; and 

the side chains of residues 3, 4 and 5 may enhance binding by 
30 non-specific hydrophobic interaction with NS3. 
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Methods 

Initial structure of the NS3-peptide complex 

The crystal structure of NS3 with a peptide cofactor NS4A was 
obtained from the arts (Kim et al. (1996) Cell 87:343) and was used in 
5 the studies with peptide inhibitors. The crystal structure of NS3/NS4A 
was regularized using molecular mechanics described herein. Initial NS3- 
NS4-peptide complexes were constructed by placing the peptides into the 
NS3 binding site expected by structural homology to by other serine 
proteases: 

10 the C-terminal carboxyl was placed near the oxyanion-stabilizing 

site (residues 137-139); 

the side chain of Cys 6 was inserted into the hydrophobic cavity 
formed by L135, F154 and A157; and 

the e-amino group of K 136 was placed in contact with the C- 
15 terminal carboxyl (see, Kim et al. (1996) Cell 87:343, Steinkuhler et al. 
(1998) Biochemistry 37:8899). 
Monte Carlo simulations 

In order to optimize the complexes, Biased Based Probability Monte 
Carlo (BPMC) simulations (Abagyan et al. (1994) J. Mol. Biol. 235:983) 

20 were performed on the NS3-peptide complexes using the ICM program 
(commercially available from MolSoft, San Diego, CA) with ECEPP/3 force 
field and atomic solvation energies (Momany et al. (1975) J. Phys. Chem. 
79:2361, Nemethy et al. (1992) J. Phys. Chem. 96:6472, Abagyan et al. 
(1997) Computer Simulations of Biomedical Systems: Theoretical and 

25 Experimental Applications, vol. 3, Kluwer Academic Publishers, 

Dordrecht, The Netherlands, p. 363). The sampling method was BPMC 
with random change of one variable at a time. A Metropolis acceptance 
criterion was applied after energy minimization (quasi-Newton, up to 1000 
steps). Simulations were performed at a temperature of 1000° K. The 
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peptide translational and rotational degrees of freedom, all peptide torsion 
angles and x angles of the protein side-chains located within 7.0 A of any 
peptide atom were varied during the BPMC simulations. 
The energy function used in the MC simulations included: 
5 ECEPP/3 terms for energy in vacuo (VDW (van der Waals), H-bond, 

electrostatic and torsion potentials); 

distance dependent electrostatics with e 0 = 4.0; and 

surface energy with atomic solvation parameters. 

The total energies of the complexes were calculated including 
10 contributions from: ECEPP/3 VDW, H-bond, S-S bond and torsion terms; 
exact-boundary electrostatic energy with e 0 = 8.0; and side-chain 
entropies. Hydrophobic free energies were estimated as sA, where A is 
accessible surface area and s is a tension constant of 0.03 kcal/molA 2 . 

Strategy of the flexible Monte Carlo docking 
15 The simulations proceeded with multiple, relatively short MC runs 

(2000-5000 generated structures). New docking cycles were started 
from the lowest-energy or other interesting structures found in previous 
runs. Structures saved during various MC runs were sorted by total 
energies and RMSD (root-mean-squared deviation), and compressed into a 
20 cumulative conformational stack. Binding energies were calculated for 
representative structures of each complex thus obtained. This strategy 
was more efficient than continuous long simulations because the variable 
torsion angles and distance constraints are defined for an initial structure 
and do not change during the MC run. 
25 Binding energies of the peptide-protein complexes 

For low-energy conformations found after several iterative BMPC 
cycles, peptide-protein binding energies were estimated using the 
equation: 
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where E COfnp , is the energy of the complex, E pept & E prot are separate 
energies of the peptide and protein, respectively, and E G is an adjustable 
constant. 

The binding energy function included: exact-boundary electrostatic 
5 free energy contributions; side-chain entropy; and surface tension 
hydrophobic free energy terms. (Zhou and Abagyan (1998) Folding 
Design 3:513, Schapira eta/. (1999) J. Mol. Recognition 12:177). 
ECEPP/3 hydrogen-bonding terms were included with a weight of 0.5. 
Results 

10 Models of the NS3-peptide complexes 

RMSD between pharmacophore atoms of peptides 1 and 2 were 
calculated for all pairs of BPMC structures. Two models of the NS3- 
peptide complexes were selected assuming (1) similar positions of 
pharmacophore groups of two peptides in the binding site (RMSD ^ 2.0 

15 A) and (2) low binding energy of the complexes (AE bind < 5.0 kcal/mol). 
Two models of the NS3-peptide complex were selected by visual 
inspection. 

Characteristics of the binding sites for peptide inhibitors in two 
NS3-peptide complex models are summarized in Table 1. 

20 

Table 1 



site 


Peptide 


NS3 residue, group 


Type of 


Present for Peptide 




residue 




interaction 


Model 1 


Model 2 


P1 


Cys 6 COO- 


K136 NH 3 + 


H-bond/el. 


1,2 


1,2 






G137 NH 


H-bond 


1,2 


2 






S139 OH 


H-bond 


1,2 


2 




Cys 5 SH 


L135, F154, A157 


hydroph 


1,2 


1,2 


P2 


Cha 5 


H57, R155, A156 


hydroph 


1,2 








A157, V158 


hydroph 




2 


P3 


lie 4 


V132. S133 


hydroph 


1,2 


2 






V158, C159 


hydroph 




1 
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more preferably isoleucine; residue 51 is an aliphatic amino acid, more 
preferably glycine; residue 52 is an aliphatic amino acid, more preferably 
glycine; residue 78 is an aliphatic amino acid, more preferably glycine; 
residue 79 is an aliphatic amino acid, more preferably proline; residue 80 
5 is a hydrophilic amino acid, more preferably threonine; residue 81 is an 
aliphatic amino acid, more preferably proline; residue 94 is an aliphatic 
amino acid, more preferably glycine; residue 95 is a thio-containing amino 
acid, more preferably cysteine; residue 96 is hydrophilic amino acid, more 
preferably threonine; residue 97 is hydrophobic amino acid, more 

10 preferably leucine; residue 98 is hydrophilic amino acid, more preferably 
asparagine; and residue 99 is an aromatic amino acid, more preferably 
phenylalanine. These invariant regions can subsequently be used to 
assist in the design drugs or therapeutic agents which bind to the 
invariant regions and disrupt the activity of the protease with greater 

15 efficacy than drugs commonly used to treat HIV and where the free 
energy of binding between said drug or therapeutic agent and the 
structural invariant region is evaluated as described herein. The methods 
described in this example can also be applied to HIV RT and to any 
protein of interest that exhibits polymorphisms. 

20 EXAMPLE 4 

Computational Phenotyping of HIV-1 Protease and Reverse Transcriptase 

Computational or in silico phenotyping is performed to assess 
phenotypic properties of a protein. This example demosntrates 
application of this method to HIV-1 protease and reverse transcriptase to 

25 test whether the efficacy of various protease inhibitors for an HIV patient. 



To practice this method 3-D structures of HIV-1 protease and 
reverse transcriptase based upon the nucleic acid isolated from HIV from 
a patient are generated. Protein-drug binding analysis in silico in order to 



WO 01/35316 



PCT/US00/30863 



-71- 



P4 


Leu 3 


Res. 157 to 160 


hydroph 


1,2 


2 








V132, S133 


hydroph 




1 




P5 


Glu 2 COO- 


R161 guanidine 


H-bond/el. 




1,2 


P6 


Asp 1 COO- 


R161 guanidine 


H-bond/el. 


1,2 










S133 OH 


H-bond 




1,2 





5 Validation of the models: modifications of the protein and ligands 

in the binding site 

In order to validate the proposed models, the K136M mutation and 
peptide modifications known from SAR (structure-activity relationship) 
studies were performed in low-energy structures of the NS3-peptide 2 
10 complex. 

Positions of the modified ligand and conformations of adjacent 
protein side chains were adjusted by energy minimization. Distance 
restraints were applied to keep the ligand near its initial position. 

Changes in calculated binding energies upon modifications, AE bind 
15 (calc), were compared to the values expected from ratios of inhibitory 
potencies, AE bjnd (exp). 

AE bind (exp) = RT//7(IC 50 mod /IC 50 °), 
where IC 50 ° and IC 50 mod are inhibitory potencies of the parent and modified 
compounds. 

20 The correlation between experimental and calculated changes in 

binding energy upon ligand modifications in the binding site of NS3 is 
illustrated in 
FIG. 4. 
Discussion 

25 The two NS3-peptide complex models suggest a common binding 

pattern for the inhibitor P1 site (Cys 6 -OH) with the carboxyl group 
hydrogen-bonded to the oxyanion hole residues G137 and S139, and the 
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Cys 6 side chain embedded in a hydrophobic pocket formed by L135, F154 
andA157. 

This study confirms the possibility of hydrogen bonding between 
the C-terminal carboxyl and e-amino group of K136 suggested by 
5 Steinkuhler et al. ((1998) Biochemistry 37:8899) based on the K136M 
mutation in NS3. Changes in calculated binding energies upon mutation 
are consistent with an 8-fold increase in K 1 of an inhibitor with a free 
carboxyl group and with the lack of an effect on binding when the peptide 
is amidated. 

10 The models differ in binding of the negatively charged side chains 

in positions P5 and P6. The R161 guanidine interacts with a carboxyl 
group of Asp 1 and Glu 2 in Models 1 and 2, respectively. In Model 2, the 
Asp 1 carboxyl also interacts with the hydroxyl of S133. 

The models are in agreement with SAR data for peptide inhibitors 

15 of NS3. Predicted changes in binding energy upon modification of the 
protein and peptides correlate reasonably well with the changes expected 
from IC 50 ratios. Standard deviations of AE bjnd (calc) - AE bind (exp) were 0.8 
and 1 .6 kcal/mol for Models 1 and 2, respectively, with correlation 
coefficients of 0.62. After the largest outlier was removed from each 

20 dataset, correlations improved to 0.81 and 0.76, respectively. 
Conclusions 

An effective iterative Biased Probability Monte Carlo protocol for 
the docking of flexible peptide ligands into a flexible protein active site 
has been developed. Two models of the complexes of HCV NS3 protease 
25 with potent peptide inhibitors were proposed based on the docking 
simulations and on evaluation of protein-ligand binding energies. The 
models were validated by in situ modifications of NS3-peptide complexes 
and by correlation of binding energies of modified complexes with those 
expected from experimental IC 50 values. Proposed models can be used 
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for planning further mutagenesis studies of the HCV NS3 protease and 
the models can be used in the design of non-peptide inhibitors using 
structure-based drug design methodologies. 

EXAMPLE 2 

5 LEAD OPTIMIZATION BY RECEPTOR-BASED FREE ENERGY 

QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS (QSARS) FOR 
TNF RECEPTOR ANTAGONIST DISCOVERY 

The goal of the modeling studies in this phase was to identify 
10 binding modes and complex structures of the compounds that bind to 
TNF receptor type I protein in order to guide the design of new 
compounds. An approach that relies on docking compounds to the 
receptor, evaluating free energy changes of binding of the docked 
structures, and comparing the calculated values with experimental 
15 inhibition constants Kj of the compounds was developed- The success of 
the calculations was assessed by evaluating the consistency of the 
calculated free energy changes of binding and the experimental Kj. 

The difference in free energy changes of binding between two 
compounds with inhibition constants K { and Kj' can be calculated as, 
20 AA G = -kT InKZ/Kj 

where k and T are Boltzmann's constant and absolute temperature, 
respectively. 

The 13 active compounds were studied. Their potencies, as 
measured by Kj, range from 0.1 to 30 jjM, spanning about 3 kcal/mol in 

25 free energy. It was found that the calculated free energy changes of 

binding are highly consistent with the corresponding experimental values, 
with correlation coefficient 0.966 and difference less than 0.5 kcal/mol 
(see Table 2 and Figure 4). The predicted binding modes and complex 
structures can thus be accepted with confidence. 

30 To modify these compounds, important pharmacophore features on 

the surface of the receptor that are critical for binding of the compounds 
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were identified. These features include a hydrophobic belt, a hydrophilic 
belt and 3 hydrogen bond donor sites. A few of potential hydrogen 
bonding sites, which are not used by the current compounds, were also 
derived, and can be used for designing more potent binders. 
5 Graphics-guided redesign of the compounds was performed. The 

free energy calculation was used to predict the binding activity of each 
design. Fourteen new compounds were thus designed and binding 
activities were predicted. The chemical structures of the designed 
molecules, together with the binding modes of the lead compounds, were 

10 synthesized and shown to have high affinity for the target. Some of them 
exhibit a Kj in low-nanomolar range. Hence the method provided herein 
for modification of drugs for binding to calculated 3-D structures of a 
target protein resulted in redesigned drug candidates with enhanced 
affinity for the target. 

15 This approach has advantages over the traditional x-ray 

crystallography method, which include the following: 

(1) The binding modes are determined for a group of compounds 
instead of single compound; analysis of similarity and differences reveals 
rich information in binding mechanisms. 

20 (2) The predictive power of the free energy calculation is very 

desirable for redesign of compounds. 

(3) The correlation with the biochemical activities assures 
relevancy of the explored binding modes, while a structure given by x-ray 
crystallography may not necessarily be one related to the biological 

25 functions of the compound. 

A comparison of calculated relative free energy changes of binding 
AAA and experimental AAG converted from inhibition constants Kj (all in 
kcal/mol) of the compounds (referenced by a code name) is presented in 
Table 2. 
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Table 2 



Compound 


AAA 


~ 1 — " 1 

AAG 


SBI-2030 


0 


0 


SBI-2002 


-0.97 


-1.25 


SBI-2005 


-0.72 


-1.14 


SBI-307 


-0.56 


-0.08 


SBI-2008 


-0.53 


-0.82 


SBI-2006 


-0.34 


-0.44 


SBI-306 


-0.07 


0.40 


SBI-2000 


0.29 


0.27 


SBI-2001 


0.72 


1.12 


SBI-304 


1.55 


1.45 


SBI-308 


1.70 


1.78 


SBI-305 


1.86 


1.67 


SBI-2048 


1.95 


1.94 



A comparison of calculated versus experimental binding free energy 
changes is given in FIG. 5. 

EXAMPLE 3 

20 HIV Protease Models for Drug Studies 

Antiviral therapy for AIDS has focused on the discovery and design of 
inhibitors for two main enzyme targets of the HIV-1: reverse transcriptase 
(RT) and protease (PR). HIV RT is a heterodimer composed of p51 and 
p66 subunits. The p51 subunit is composed of the first 450 amino acids 

25 encoded by the RT gene and the p66 subunit is composed of all 560 
amino acids of the RT gene. RT is responsible for RNA-dependent DNA 
polymerization, RNaseH activity, and DNA-dependent DNA polymeriza- 
tion. 
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HIV PR is a homodimer of two identical 99-amino acid chains. HIV 
PR is an aspartic proteinase that is responsible for the post-translational 
processing of the viral gag and gag-pol polyprotein gene products, which 
yields the structural proteins and enzymes of the viral particle (see, e.g., 
5 Erickson et a/. (1996) Annu. Rev. Pharmacol. Toxicol. 36:545-571, Bouras 
etal. (1999) J. Med. Chem. 42:957-962). Despite several promising new 
anti-HIV agents, the clinical emergence of drug-resistant variants of HIV 
limits the long-term effectiveness of these drugs. Genetic analysis of the 
resistant forms of HIV has identified a number of critical mutations in the 

10 RT and PR genes. Moreover, structural analysis of inhibitor-enzyme 
complexes and mutational modeling studies can lead to a better 
understanding of how these drug-resistant mutations exert their effects at 
the structural and functional levels. 

HIV-PR inhibitor computational binding studies 

15 This example provides the results of a computational study on HIV 

PR. The 3-D protease structure was generated, docked with known viral 
inhibitors, and analyzed via free energy of binding studies described 
herein. A quantitative agreement between the calculated add 
experimental protease-drug binding energies was obtained. Moreover, a 

20 series of 3-D HIV PR models were analyzed to identify the invariant 

regions of the protease. These insights have implications for the design 
of new drugs and therapeutic strategies to combat AIDS drug resistance. 
Optimization of 3D structures 

Five PR inhibitors approved by the FDA for clinical use were used: 
25 saquinavir, nelfinavir, indinavir, amprenavir, and ritonavir (Figure 6). 

Initial 3-D structures for the wild-type HIV PR complexes with these FDA 
approved inhibitors were obtained from the Protein Data Bank and were 
then optimized using Monte Carlo (MC) simulations with an ECEPP/3 
force field as described in Example 1 . The energy function used in the 
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MC simulations included: ECEPP/3 terms for energy in vacuo (van der 
Waals, H-bond, electrostatic and torsion potentials); distance dependent 
dielectrics with e 0 = 4.0; and surface free energy calculated using atomic 
solvation parameters ((Dudek et al. (1998) J. Computational Chem. 
5 /S:548-573, Wang et al. (1995) J. Mol. Biol. 253:473-492). Standard 
ECEPP charges were used for the protein residues. Lys, Arg, Glu, and 
Asp residues were charged. Charged and protonated states of Asp 125 
(chain B) were considered as well. The inhibitors were docked into the 
active site of the protease, and the protein-drug complexes were 

10 energetically refined using the methods described in Example 1. Partial 
charges for the inhibitors were calculated with the Gasteiger-Marsili 
method implemented in SYBYL 6.5 (Tripos Assoc., Inc.). Different 
protonation states were examined for indinavir and amprenavir, but the 
other inhibitors were assumed to be electroneutral. Water molecules 

15 located within 7.0 A from a ligand atom in the X-ray structure were 
retained in the model complex during optimization. 
Calculation of binding energies 

For low energy conformations found after several iterative BMPC 
cycles, protein-drug binding energies were estimated using the equation: 

20 E bjnd = E 0 + E compl - E ngand - E prot/ 

where E comp , is the energy of the complex, E, jgand & E prot are energies of the 
ligand and protein when separated, and E G is an adjustable constant. The 
binding energies of the protein and ligand were calculated using the 
following energy function: 

25 E = E el + E vw + E hb + E s , 

where E e ,is the exact-boundary electrostatic using e 0 = 8.0, E s is the 
side-chain entropy term, and E vw and E hb are the ECEPP/3 van der Waals 
and hydrogen-bonding terms. 
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After the energies of the wild type PR-inhibitor complexes were 
calculated, mutation sites were introduced into the optimized X-ray 
structures or model complexes. The amino acid substitutions were 
followed by local optimization, using an ECEPP/3 force field, of protein 
5 side chains around the mutation sites via the energy minimization of 

substructures that included the ligand, water molecules within the sphere 
of radius 7.0 A around the ligand, and protease residues within the 
sphere of radius 3-5 A around the mutated residues. The energy of 
binding of the mutated complex was calculated based on the equation 

10 described herein. The difference in binding energy resulting from 

mutations (mut) of the wild-type (WT) protease were calculated using the 
following equation: 

AE bind (calculated) = E bind (WT) - E bind (mut). 
This change in binding energy was compared to data from experimental 

15 (exptl) studies (Gulnik et al. (1995) Biochemistry 35:9282-9287, Klabe et 
al. (1998) Biochemistry 37:8735-8742, Pazhanisami et al. (1996) J. Biol. 
Chem. 271:17979-17985, Jacobsen et al. (1995) Virology 206:527-534, 
Maschera et al. (1996) J. Biol. Chem. 217:33231-33235) based on the 
equation: 

20 AE bind (exptl) = RTInfKjmut/KjWt). 

Plots of AE bind (calculated) vs. AE blnd (exptl) were generated, and the results, 
summarized in Table 3, show a strong correlation between the calculated 
binding energies and the experimentally determined binding energies for 
the PR-inhibitor complexes. For example, the correlation coefficient R for 

25 PR-ritonavir and PR-amprenavir is 0.9, where R = 1 denotes congruency 
between the computationally calculated and experimentally determined 
binding energy data. These correlation data validate the computational 
protocol and calculations described herein as a method for predicting 
protein-drug binding or protein-drug resistance (i.e. non-binding). The 
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evaluation of changes in binding energy of protein-drug complexes upon 
protein sequence variations can be used as a possible descriptor and, 
thus, can be used to predict the efficacy of drugs on proteins resulting 
polymorphisms in genes. Moreover, the analysis of the free energy of 
5 binding in complexes between the protein models that are produced by 
the method set forth in this example and drugs that have been designed 
or modified is a good predictive tool for drug designers. 

TABLE 3 

Correlation between Experimental and Calculated Binding Energies 
10 for HIV Protease Inhibitors 



HIV 
PRInhibitor 


X-ray 
Complex ID 


No of exptl. 
data points 


Correlation 
coefficient R 


Correlation 
S.D., kcal/mol 


Saquinavir 


1HXB 


18 


0.84 


0.68 


Indinavir 


1HSG 


17 


0.79 


0.80 


Ritonavir 


1HXW 


12 


0.90 


0.72 


Amprenavir 


1HPV 


15 


0.90 


0.54 


Nelfinavir 


10HR 


Insufficient data 



Identification of structural invariant regions of HIV Protease 

20 Clinical effectiveness of HIV PR inhibitors is limited by the rapid 

emergence of drug-resistant mutations. Resistant PR variants first occur 
by the mutation of amino acids close to or in and around the drug binding 
site, which are then accompanied by compensatory mutations of more 
distant amino acids. The identification of highly conserved, structural 

25 invariant regions of a PR would provide new potential targets and thus 
lead to the development of therapeutics having greater clinical efficacy 
than those drugs commonly employed to treat HIV. 

The protein sequences of HIV protease were obtained from 
GenBank and from the blood samples of patients using standard isolation 

30 and sequencing techniques well known in the arts. The protein 

sequences were modeled into 3-D structures using the computational 
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protocol described in Example 1. The protease sequences were aligned, 
and the frequency of mutation, regardless of type, was determined at 
each amino acid position and plotted in Figure 7, where the frequency of 
mutation in this set of HIV-1 Protease sequences varied from 0 to 40%. 
5 Sequence alignment also revealed how many different types of amino 
acids could be substituted in any specific residue, yielding the tolerance 
of each residue to substitutions of different types. The data showing the 
frequency of mutation of each residue out of PR sequences, the types of 
mutations, and the distance of the mutating residue from the active site 

10 (Asp 28) are shown in FIG. 8. This information, sequences obtained from 
10591 different genotypes, was used to identify invariant and/or highly 
conserved regions of PR and to map these regions to a 3-D structure for 
the purpose of identifying new potential regions on the protein as targets 
for therapeutic intervention. These invariant regions include, but are not 

15 limited to, residues 1-9, 25-29, 49-52, 78-81, and 94-99, where residue 
1 is an aliphatic amino acid, more preferably proline; residue 2 is a 
hydrophilic amino acid, more preferably glutamine; residue 3 is an 
aliphatic amino acid, more preferably isoleucine; residue 4 is a hydrophilic 
amino acid, more preferably threonine; residue 5 is a hydrophobic amino 

20 acid, more preferably leucine; residue 6 is an aromatic amino acid, more 
preferably tryptophan; residue 7 is a hydrophilic amino acid, more 
preferably glutamine; residue 8 basic amino acid, more preferably 
arginine; residue 9 is an aliphatic amino acid, more preferably proline; 
residue 25 is a hydrophilic amino acid, more preferably aspartic acid; 

25 residue 26 is a hydrophilic amino acid, more preferably threonine; residue 
27 is an aliphatic amino acid, more preferably glycine; residue 28 is an 
aliphatic amino acid, more preferably alanine; residue 29 is an acidic 
amino acid, more preferably aspartic acid; residue 49 is an aliphatic amino 
acid, more preferably glycine; residue 50 is a hydrophobic amino acid, 
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determine whether drug binding does (i.e. sensitivity) or does not (i.e. 
resistance) take place. 

Sequencing of HIV-1 Protease and Reverse Transcriptase is 
performed on HIV-1 cDNA following extraction, reverse transcription, and 
5 PCR amplification of viral RNA obtained from patient specimens, such as 
blood samples or other body fluid or tissue samples. Methods for the 
extraction, reverse transcription, and PCR amplification of viral RNA are 
well known in the art. For each sequence, a computer-generated 3-D 
structure of the protein is modeled and then docked with antiviral drugs in 

10 silico using methods described in Example 1 and elsewhere herein to 
analyze protein-drug interactions. Antiviral drugs that can be tested 
include, but are not limited to, saquinavir, indinavir, ritonavir, amprenavir, 
and nelfinavir for HIV protease; zidovudine, lamivudine, stavudine, 
zalcitabine, didanosine, abacavir, adefovir, delavirdine, nevirapine, and 

15 efavirenz for HIV reverse transcriptase; and any FDA-approved or non- 
FDA approved antiviral drug. From these protein-drug interaction studies, 
relative drug resistance or sensitivity is inferred by calculating and 
evaluating the free energy of binding in low energy conformations of 
complexes between the variant protease structure and docked antiviral 

20 drug or variant reverse transcriptase structure and docked antiviral drug, 
using the methods described in Examples 1 and 3 and elsewhere herein. 

The results of the computational phenotyping procedure can be 
presented as a patient report that states whether a drug or drugs are 
sensitive or resistant to the RT or PR obtained from the patient. Such a 

25 patient report assists physicians in selecting appropriate drugs for HIV 
patients. It also is useful for the in vitro diagnostics industry in an 
adjunct test/service capacity to help optimize antiviral therapy. 
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EXAMPLE 5 

HIV Protease and Reverse Transcriptase Databases 

Exemplary databases of the 3-D protein structures of polymorphic 
variants are described in this example. The HIV PR and RT databases are 
5 a comprehensive collection of 3-D polymorphic structural data along with 
related information, including nucleic acids encoding all or a portion of the 
protein. These data provide a means to understand differences in the 
interactions between a drug or drugs and the structural variations of the 
drug targets. 

10 This example describes the creation, interface for, and use of structural 
variant databases of HIV protease and reverse transcriptase polymorphic 
variants. 

Construction of databases 

To implement the RT or HIV database described herein, suitable 
15 computer for performing database server tasks includes a "Pentium" level 
CPU having at least 128 MB of memory, 30 GB of disk storage, and 256 
MB of disk swap space for files. A recommended configuration for better 
computer performance would include, for example, a "Pentium III" 
processor at 700 MHz or faster, memory of 256 MB or greater, disk 
20 storage space of 50 GB or more, and swap space of 500 MB or more. A 
suitable configuration for performing user tasks as described above 
includes a "Pentium" level CPU having 128 MB memory, disk space of 
240 MB with swap space of 256 MB, and an optional display circuit card 
supporting OpenGL and having 4 MB of memory. A recommended 
25 configuration for better performance would include, for example, a 
"Pentium III" processor at 500 MHz or faster, memory of 256 MB or 
greater, disk space of 500 MB or more, swap space of 500 MB or more, 
and an optional display card having 8 MB of memory or more, supporting 
resolution of 1024 x 768. 
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Preferably, the software used in the computing system described 
above includes, for the server machine, operating system software such 
as "Windows NT Server 4.0" from Microsoft Corporation, with Service 
Pack 5, Version 1280 (10 June 1999) or more recent, with database 
5 management server software such as "Oracle Server Standard Edition 
8.1" from Oracle Corporation, or better. The software used in a preferred 
embodiment of the user machine includes operating system software such 
as "Windows NT Workstation 4.0" from Microsoft Corporation, with 
Service Pack 5, version 1280 (10 June 1999) or more recent, as well as 

10 "Oracle Client Standard Edition Version 8.1" or better. The client 

machine will also be compliant with the "Java" programming language 
(Java Runtime Environment 1 .2.2). As will be known to those skilled in 
the art, other configurations may be suitable, depending on the 
applications being used and the computer performance desired. 

1 5 Database Interface 

The database interface was a Java-based interface with useful 
features. The database is interfaced to a molecular graphics package that 
includes 3-D visualization, including wire-frame representations; 
secondary structure ribbons; and solid surfaces, and structure analysis 

20 tools. The database also provides an interface to access all of the 

collected files from the same 3-D structure. The database interface also 
provides access to other databases, such as databases of chemical 
structures and public domain databases such as GenBank and the Protein 
Data Bank. The OpenGL and C+ + module has real-time interaction with 

25 the sequence display and sequence analysis modules, such that 

highlighting residues in one display results in highlighting those same 
residues in other displays. 

The relational database containing the protein information may be 
structured according to relational objects to facilitate the analysis and 
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computation processes described in the preceding examples. FIG. 10 is a 
graphical representation of the database objects for the system described 
herein. The database is organized by classes, each of which is 
characterized by data attributes and subclasses for the proteins. 
5 FIG. 10 shows that the database design includes classes 

comprising Variant and related classes of Sample, Residue, Model, 
ResistanceJEntry, and Protein. Other classes include Conformation, 
Residue_Conformation, Atom, Drug, Family, and Subfamily. These 
classes store attribute data values and specify class parameters and 

10 behaviors to provide the functionality described herein. 

For example, FIG. 10 shows that the Variant class stores 
parameters to specify a variant, including subclasses that specify a 
VariantJD, SampleJD, ProteinJD, Name, and Sequence, where 
VariantJD is the identification number of the variant; SampleJD is the 

15 identification number of the sample from which HIV PR and RT were 
obtained; ProteinJD is the identification number of the protein i.e. PR or 
RT; Name is the name of the variant distinguishing it from other variants 
encoded by the same DNA due to ambiguities in the nucleic acid 
sequence; and Sequence is the nucleotide or amino acid sequence. 

20 Similarly, FIG. 10 shows that the Sample class includes subclasses 

relating to a specific sample and which specify SampleJD, SampleJDate, 
Sex, Ambiguity JMumber, Distance, SequenceLength, Sequence, Clade, 
and Region, where SampleJD is as defined herein; SampleDate is the 
date the sample was obtained; Sex is the gender of the sample donor; 

25 Ambiguity JMumber is fraction of ambiguous nucleotide positions; 

Distance is a normalized number the variation of an amino acid from the 
master clade; Sequence_Length is the length of the sequence; Sequence 
is as defined herein; Clade is the master sequence; and Region is the 
geographic location from which the sample was obtained. The Model 
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class includes subclasses comprising ModelJD, ModelJMame, VariantJD, 
and DrugJD, where ModelJD is the identification number of the 3-D 
protein model; ModeljvJame is the name of the 3-D protein model; 
VariantJD is as defined herein; and DrugJD is the identification number 
5 of the drug i.e. antiviral drug. The atom class includes the subclasses 
comprising AtomJMame, Residue_ConformationJD, XCoordinate, 
Y_Coordinate, and Z_Coordinate, where Atomjvlame is the name of atom 
in the 3-D protein structure; Residue_ConformationJD is the identification 
number of the amino acid conformation in a 3-D structure; and 

10 XCoordinate, YCoordinate, and Z Coordinate are the coordinates of the 
3-D protein structure. The conformation class includes the subclasses 
comprising ConformationJD, ModelJD, and RefinementJ.evel, where 
ConformationJD is the identification number of a conformation of a 3-D 
structure; ModelJD is as defined herein, and Refinement_Level is the 

15 number of times the conformation was refined energetically. The drug 
class includes the subclasses comprising DrugJD, Profile, Symbol, 
Namel, Name2, Company, and URL, where DrugJD is as defined herein; 
Symbol is the FDA symbol for the drug; Namel is the name of the drug, 
Name2 is an alternative name of the drug; Company is the company that 

20 makes the drug; and URL is the website address of the company that 

makes the drug. The residue_conformation class includes the subclasses 
comprising ResidueConformationJD, ConformationJD, and Residue ID, 
where ResidueConformationID is as defined herein; ConformationJD is 
as defined herein; and ResidueJD is the identification number of the 

25 amino acid. The ResistanceJEntry class includes the subclasses 

comprising Resistance JEntry J D, Profile, ProteinJD, ResidualJMumber, 
Amino_Acid, Weight, and Maximum_Weight, where ResistanceJEntry J D 
is; ProteinJD is as defined herein, Amino_Acid is the amino acid. The 
Family class includes the subclasses comprising FamilyJD and 
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FamilyJMame, where FamilyJD is the identification number of the protein 
family and FamilyJMame is the name of the protein family. The SubFamily 
class includes the subclasses comprising SubFamilyJD, SubFamilyJslame, 
and FamilyJD, where SubFamilyJD is the identification number of the 
5 protein subfamily, SubFamily Jslame is the name of the protein subfamily, 
and FamilyJD is as defined herein. The Protein class includes the 
subclasses comprising ProteinJD, ProteinName, Species, 
MultipleJDomain, Multiple Chain, and WildType, where ProteinJD is as 
defined herein, Protein Name is the name of the protein i.e. RT or PR; 

10 Species is the species of the source of the protein i.e. humans; 

MultipleJDomain is the domain of the protein i.e p66 or p51 in the case of 
RT; Multiple_Chain is the a or b chain in the dimers of RT and PR; and 
WildJType is the wild-type protein sequence for RT and PR. The residue 
class includes the subclasses comprising Residue ID, VariantJD, Chain, 

15 ResidueNumber, Insertion Code, and Residue J^ode, where Residue ID is 
the identification number of the amino acid, VariantJD is as defined 
herein, Chain, ResidueJMumber is the numbering of an amino acid in a 
protein sequence, Insertion Code is the identification number if different 
insertions occur in the amino acid sequence, and Residue_Code is the 

20 single letter or 3-letter code of an amino acid. Those skilled in the art will 
understand the database design exemplified in FIG. 10. It should be 
understood that other classes or parameters may be included, as selected 
by those skilled in the art, for the desired database design. 
Database Content 

25 The databases contain information on the variants of HIV PR and 

RT present in patient populations. The master amino acid sequence, 
nucleic acid sequence, and 3-D structure are obtained from GenBank; an 
exemplary master sequence is set forth in SEQ ID No. 118. Nucleotide 
sequences exhibiting polymorphisms and the corresponding structural 
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variant protein sequences are determined by isolating nucleic from viruses 
and viral nucleic acid obtained from the blood samples of patients 
throughout the US, as well as from other countries, using sequencing 
methods well known in the art. The sequences were inputted into the RT 
5 and PR databases. Exemplary of the nucleotide sequences and the 

encoded amino acids for HIV RT and PR in this data base are set forth in 
SEQ ID NOS. 3 to 117, where r is g or a; y is t/u or c; m is a or c; k is g 
or t/u; s is g or c; w is a or t/u; b is g or c or t/u; d is a or g or t/u; h is a 
or c or t/u; v is a or g or c; and n is a or g or c or t/u or unknown or 
10 other. The amino acid sequences of the wild type and structural variants 
are used to create 3-D protein structures which are deposited into the 
databases. 

1 . 3-D Protein Models 

The structure of the wild-type or master sequence model of PR and 
15 RT were obtained from the crystal structures found in PDB. The initial 
structure was refined energetically using BPMC with an ECEPP force field 
as described in Example 1. The quality of the model was assessed by 
calculating Normalized Residue Energies (NREs), where models with e av ^ 
1.5 require further energetic refinement; and models with e av < 1.5 were 
20 deposited into the database as described herein. The 3-D protein 

structures of the variant sequences were generated by comparing these 
structures to the master sequence (see, e.g., SEQ ID No. 118; i.e., 
homology modeling) and energetically refining the models ab initio, using 
the same force field and BPMC procedure as the master sequence and 
25 applying the same quality control standard as described herein. Figure 11 
is a tabulation of the 3-D coordinates of an exemplary HIV PR entry in a 
database that includes 3-D structures. For US purposes and where 
permitted, Tables 4 and 5 are provided electronically on CD ROM. These 
Tables house the coordinates that represent the 3-D protein structures of 
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proteins encoded by the nucleic acids set forth in SEQ. ID. NOS. 3-1 17. 
It will be noted that these sequences encode a full length PR and about 
200 nucleotides the p51 subunit, which is the subunit of interest herein. 
To construct the full-length 3-D structure, the 3-D structure of each 
5 encoded portion of the p51 subunit was generated and then combined 
with the structure of the master sequence to produce a full-length 
structure. 

These 3-D structures in the database can be selected and exported 
into computational docking programs for analyzing protein-drug 

10 interactions on known drugs, new drugs or modified drugs. The database 
can be mined to find protein models that correspond to patients with a 
particular genetic polymorphism, patients with the most commonly 
occurring polymorphism, to a relevant patient subpopulation {e.g., gender, 
age, race, or other characteristic), to patients receiving a specific 

15 treatment regimen, to patients exhibiting a particular clinical response, to 
structural invariants, or to other relevant criteria. 

Drugs can be docked into the active sites of PR and RT and subsequently 
energetically refined using an ECEPP force field and BPMC as described in 
Example 1 . The quality control is that the protein-drug complex 
20 represents a low energy conformation, which may take several iterative 
BMPC cycles. Then, the binding energies of the protein-drug complexes 
can be estimated using the methods of Example 1. Drug designers can 
modify the structures of drugs 

or design new drugs, using methods well known in the arts, to maximize 
25 the drug binding to the models generated by this database. 
2. Other Data 

Each PR or RT nucleotide sequence in the database has associated 
with it an identification number, the nucleotide sequence length, the 
translated amino acid sequence (or sequences in cases of ambiguous 
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nucleotide positions), a 3-D structure for each amino acid sequence (from 
which a number of structurally related values are calculated), the 
genotyping date, the gender of the patient, the geographical location from 
which the sample was sent, the clade of the sequence, the fraction of 
5 ambiguous nucleotide positions, drug information, and other clinical 
information. 

Database Usage 

A query menu allows the user to retrieve data based on the various 
fields: sample ID, residue number (with or without specific amino acid 

10 mutation), date gender, geographic location, distance from the master 
sequence, and other useful queries. The set of sequences that satisfies 
the user's query are brought up in a sequence display module, which 
have variations from the master sequence indicated initially, although the 
sequences can be highlighted according to predicted resistance. This 

15 subset of sequences can be subjected to further analyses. For example, a 
histogram summarizing the number of mutations at each position in the 
subset can be generated. The 3-D structures for any of the variants in 
the database can be displayed and analyzed in the structure visualization 
module, allowing the user to compare the similarities and differences 

20 between 3-D structures by superimposing the 3-D structures. The user 
and also export these structures into programs for protein-binding studies 
as described herein. Thus, by mining the databases, a user will access 3- 
D structures and clinical and sample information that can be used in and 
correlated with protein-drug binding studies of HIV PR and RT. 

25 Database Applications 

The HIV PR and RT databases have many applications. The 
applications include, but are not limited to, any application and method 
provided herein, such as databases that assist in de novo drug design and 
drug binding calculations. In particular, the database can be used in the 
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design of 2nd and 3rd generation drugs to combat potential resistance to 
HIV therapy, and it can be used in the design of drugs that will impact a 
broad spectrum of the infected population. The databases provide the 
ability to design drugs that focus on the most highly conserved regions of 
5 a drug target and drugs that will avoid resistance to mutation. The 

database could be used to rank drug candidates by likely efficacy within a 
given subpopulation of patients (e.g. age, race, gender) in pre-clinical 
trials and to predict the most effective drug regimen to give a patient, and 
for designing clinical trials. 

0 

Since modifications will be apparent to those of skill in this art, it is 
intended that this invention be limited only by the scope of the appended 
claims. 
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CLAIMS 

1 . A computer-based method of drug design based on genetic 
polymorphisms, comprising: 

obtaining more than one amino acid sequence of target proteins 
5 that are the product of a gene exhibiting genetic polymorphisms, wherein 
the sequences represent different genetic polymorphisms; 

generating 3-dimensional (3-D) protein structural variant models 
from the sequences; and 

based upon the structures of the 3-D models, designing drug 
10 candidates, modifying existing drugs, identifying potential drug 
candidates or identifying modifications of existing drugs based on 
predicted intermolecular interactions of the drug candidates or modified 
drugs with the structural variants. 

2. The method of claim 1 / wherein the structure-based drug 
15 design method comprises: 

computationally docking the drug candidate or modified drug 
molecules with the target protein structural variant models; 
energetically refining the docked complexes; 

determining the binding interactions between the drug candidate or 
20 modified drug molecules and the structural variants; and 

designing and identifying drugs or modifications to existing drugs 
based on the binding interactions. 

3. The method of claim 2 wherein the binding interactions are 
determined by: 

25 calculating the free energy of binding between the protein 

structural variant model and the docked molecule; and 

decomposing the total free energy of binding based on the 
interacting residues in the protein active site. 

4. The method of claim 1 wherein: 
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after the protein structural variant models derived from a particular 
genetic polymorphism are generated, selected model structures are 
analyzed to determine common structural features that are conserved 
throughout the selected models, wherein 
5 the conserved structural features are used as a basis for structure- 

based drug design studies. 

5. The method of claim 4, wherein the conserved structural 
features are stretches of non-contiguous residues, wherein each stretch 
contains at least two amino acids. 
10 6. The method of claim 5, wherein the protein is human 

immunodeficiency virus protease. 

7. The method of claim 6, wherein the conserved residues 
comprise residues comprise residues 1-9, 25-29, 49-52, 78-81 and 94- 
99; and wherein: 

15 residue 1 is an aliphatic amino acid; residue 2 is a hydrophilic 

amino acid; residue 3 is an aliphatic amino acid; residue 4 is a hydrophilic 
amino acid; residue 5 is a hydrophobic amino acid; residue 6 is an 
aromatic amino acid; residue 7 is a hydrophilic amino acid; residue 8 is a 
basic amino acid; residue 9 is an aliphatic amino acid; residue 25 is an 

20 acidic amino acid; residue 26 is a hydrophobic amino acid; residue 27 is 
an aliphatic amino acid; residue 28 is an aliphatic amino acid; residue 29 
is an acidic amino acid; residue 49 is an aliphatic amino acid; residue 50 
is a hydrophobic amino acid; residue 51 is an aliphatic amino acid; residue 
52 is an aliphatic amino acid; residue 78 is an aliphatic amino acid; 

25 residue 79 is an aliphatic amino acid; residue 80 is a hydrophilic amino 
acid; residue 81 is an aliphatic amino acid; residue 94 is an aliphatic 
amino acid; residue 95 is a thio-containing amino acid; residue 96 is a 
hydrophilic amino acid; residue 97 is hydrophobic amino acid; residue 98 
is hydrophilic amino acid; and residue 99 is an aromatic amino acid. 
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8. The method of claim 6, wherein the conserved residues 
comprise residues comprise residues 1-9, 25-29, 49-52, 78-81 and 94- 
99; and wherein: 

residue 1 is proline; residue 2 is glutamine; residue 3 is isoleucine; residue 
5 4 is threonine; residue 5 is leucine; residue 6 is tryptophan; residue 7 is 
glutamine; residue 8 is arginine; residue 9 is proline; residue 25 is aspartic 
acid; residue 26 is threonine; residue 27 is glycine; residue 28 is alanine; 
residue 29 is aspartic acid; residue 49 is glycine; residue 50 is isoleucine; 
residue 51 is glycine; residue 52 is glycine; residue 78 is glycine; residue 
10 79 is proline; residue 80 is threonine; residue 81 is proline; residue 94 is 
glycine; residue 95 is cysteine; residue 96 is threonine; residue 97 is 
leucine; residue 98 is asparagine; and residue 99 is phenylalanine. 

9. The method of claim 6, wherein the HIV protease has the 
sequence of amino acids set forth in any of SEQ ID Nos. 3-74 and 77- 

15 117. 

10. The method of claim 9, wherein the residues comprise residues 
1-9, 25-29, 49-52, 78-81 and 94-99. 

10. The method of claim 1, wherein the selected model 
structures represent the structural variants resulting from the most 
20 commonly occurring genetic polymorphisms. 

1 1 The method of claim 1 , wherein the selected model 
structures represent the structural variants resulting from genetic 
polymorphisms found in a selected patient subpopulation. 

12. The method of claim 1 wherein the structural variant models 
25 are stored in a relational database, comprising: 

3-D molecular coordinates for the structural variants; 

a molecular graphics interface for 3-D molecular structure 
visualization; computer functionality for protein sequence and 
structural analyses; and 
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database searching tools. 

13. The method of claim 12, wherein the database further 
comprises one or more of observed clinical data associated with the 
genetic polymorphisms, subject medical history and subject history. 
5 14. The method of claim 1, wherein: 

after generating the 3-D protein structural variant models, 
the method comprises: 

computationally docking drug molecules with the target 
protein models; and 
10 energetically refining the docked complexes; and 

wherein the candidate drugs are specific for a protein with a 
selected polymorphism or specifically interact with all proteins exhibiting a 
polymorphism. 

15. The method of claim 14, wherein the structure-based drug 
15 design method comprises: 

computationally docking drug or potential new drug candidate 
molecules with the target protein structural variant models; 

energetically refining the docked complexes; 

determining the binding interactions between the drug or potential 
20 new drug candidate molecules and the structural variants; and 

designing potential new drugs or modifications to existing drugs 
based on the binding interactions. 

16. The method of claim 15, wherein the binding interactions are 
determined by: 

25 calculating the free energy of binding between the protein 

structural variant model and the docked molecule; and 

decomposing the total free energy of binding based on the 
interacting residues in the protein active site. 

17. The method of claim 14, wherein: 
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after the protein structural variant models derived from a particular 
genetic polymorphism are generated, selected model structures are 
analyzed to determine common structural features that are conserved 
throughout the selected models; and 
5 the conserved structural features are used as a basis for structure- 

based drug design studies. 

18. The method of claim 17, wherein the selected model 
structures represent the structural variants resulting from the most 
commonly occurring genetic polymorphisms. 
10 19. The method of claim 17, wherein the selected model 

structures represent the structural variants resulting from genetic 
polymorphisms found in a specific patient subpopulation. 

20. The method of claim 1 2, wherein the selected model 
structures represent structural variants derived from patients the receive a 

15 specific treatment regimen. 

21. The method of claim 12, wherein the selected model 
structures represent structural variants derived from patients that exhibit 
a particular clinical responses to a given drug. 

22. The method of claim 1 2, wherein the selected model 

20 structures represent structural variants derived based on the duration of a 
particular drug treatment. 

23. The method of claim 12, wherein the structural variant 
models are stored in a relational database, comprising: 

3-D molecular coordinates for the structural variants; 
25 a molecular graphics interface for 3-D molecular structure 

visualization; and 

functionality for protein sequence and structural analysis; and 
database searching tools. 
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24. The method of claim 12, wherein the database further 
comprises observed clinical data associated with the genetic 
polymorphisms, subject medical history and subject history. 

25. A computer-based method of selecting drug therapies for 
5 patients based on genetic polymorphisms, comprising: 

obtaining amino acid sequences of a target protein that is the 
product of a gene exhibiting genetic polymorphisms, wherein the 
sequences represent different genetic polymorphisms; 

generating 3-D protein structural variant models from the 
10 sequences; 

computationally docking drug molecules with the target protein 
models; 

energetically refining the docked complexes; 

determining the binding interactions between the drug or potential 
15 new drug candidate molecules and the models; and 

selecting drug therapies based on the drug or drugs that have the 
most favorable binding interactions with the structural variant models. 

26. The method of claim 25, wherein the binding interactions are 
determined by: 

20 calculating the free energy of binding between the protein 

structural variant and the docked drug molecule; and 

decomposing the total free energy of binding based on the 
interacting residues in the protein active site. 

27. The method of claim 1, further after generating the 3-D 
25 structural variant models, exporting some or all of them models into a 

program that computationally docks the models with test compounds to 
assess intermolecular interactions. 

28. A computer-based method for predicting clinical responses in 
patients based on genetic polymorphisms, comprising: 
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obtaining one or more amino acid sequences for a target protein 
that is the product of a gene exhibiting genetic polymorphisms; 

generating 3-D protein structural variant models from the 
sequences; 

5 building a relational database of protein structural variants derived 

based on genetic polymorphisms and observed clinical data associated 
with particular polymorphisms exhibited in the patients, wherein the 
database comprises: 

3-D molecular coordinates for the structural variant models; 
10 a molecular graphics interface for 3-D molecular structure 

visualization; 

computer functionality for protein sequence and structural 
analysis; 

database searching tools; and 
15 observed clinical data associated with the genetic 

polymorphisms, subject medical history and subject history 
associated with the genetic polymorphisms; 

obtaining a target protein structural variant based on the same gene 
associated with a polymorphism in a patient; 
20 generating a 3-D protein model based on the subject's gene 

sequence; 

screening/comparing the 3-D model derived from the subject to the 
structures contained in the database by: 

identifying structures in the database that are similar to the 
25 model derived from the subject; and 

predicting a clinical outcome for the patient based on the 
clinical data associated with the identified structures. 
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29. A computer-based method for designing therapeutic agents 
that are active against biological targets that have become drug resistant 
due to genetic mutations, comprising: 

obtaining a first 3-D protein structural variant model of a target 
5 protein against which a given drug has biological activity; 

generating a second 3-D protein structural variant model of the 
target in which genetic mutations have occurred and against which the 
same drug is no longer biologically active; 

comparing the structures of the first and second model to identify 
10 structural differences; and 

performing structure-based drug design calculations in order to 
identify new drugs or modifications to the existing drug to bring about 
biological activity against the second model. 

30. A computer-based method for identifying compensatory 
15 mutations in a target protein, comprising: 

obtaining the amino acid sequence of a target protein containing 
multiple amino acid mutations that is expressed in a patient, wherein the 
structure of a form of the target protein that responds to a particular 
drug, including the active site, has been structurally characterized; 
20 generating a 3-D structural model of the mutated protein; 

comparing the structure of the mutated protein with the form of the 
protein that responds to the drug to identify structural differences and/or 
similarities arising from the mutations; 

comparing the biological activities of the drug against both the 
25 mutated protein and the form of the protein that responds to the drug to 
determine the effects of the mutations on drug response; and 

identifying the mutations in the protein that affect biological 
activity based on the comparisons. 
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31 . A method for creating a 3-D structural polymorphism 
relational database, comprising: 

obtaining one or more amino acid sequences of a target protein 
that is the product of a gene exhibiting a genetic polymorphism, wherein 
5 sequences represent different genetic polymorphisms; 

generating 3-D protein structural variant models from the 
sequences; 

energetically refining the models; 

evaluating the quality of the models; 
10 optionally obtaining associated clinical properties or data; and 

inputting the model and any associated properties and/or data into 
a relational database. 

32. The method of claim 31, wherein after energetically refining 
the models, the models are further refined. 

1 5 33. The method of claim 31 , wherein the database comprises 

amino sequences of two or more polymorphic variants. 

34. The method of claim 31, wherein the database comprises 
amino sequences of ten or more polymorphic variants . 

35. The method of claim 31 , wherein the database comprises 
20 amino sequences of about 100 or more polymorphic variants . 

36. The method of claim 31, wherein the database comprises 
amino sequences of about 1000 or more polymorphic variants . 

37. The method of claim 31, wherein the database comprises 
amino sequences of more than 8000 polymorphic variants. 

25 38. A database created by the method of claim 31 . 

39. The database of claim 38, comprising variant 3-dimensional 
structures of a selected target. 

40. The database of claim 38 that comprises structures of 
proteases or polymerases. 
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41 . The database of claim 38, wherein the proteases are viral 
proteases or polymerases. 

42. The database of claim 38, wherein the viral proteases are 
human immunodeficiency virus proteases and the polymerase is a viral 

5 reverse transcriptase. 

43. The method of claim 31 , wherein quality is assessed by 
computing the normalized residue energies such that if e 3V is > 1 .5 a 
model is further refined until e av is < 1 .5; if e av is < 1.5 a model is 
deposited into the database. 

10 44. The method of claim 1, wherein the target is an enzyme. 

45. The method of claim 44, wherein the enzyme is a protease 
or polymerase. 

46. The method of claim 45, wherein the polymerase is a reverse 
transcriptase. 

15 47. The method of claim 44, wherein the target is a protein 

expressed by an infectious agent. 

48. The method of claim 44, wherein the target is enzyme 
expressed by a an infectious agent. 

49. The method of claim 48, wherein the agent is a human 
20 immunodeficiency virus (HIV). 

50. A computer system, comprising a database containing data 
representative of the three dimensional structure of polymorphic variants 
of a drug target. 

51 . The system of claim 50, wherein the target is a cell surface 
25 receptor or an enzyme. 

52. The system of claim 50, wherein the enzyme is a protease or 
a polymerase. 

53. A database, comprising: 
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sequences of nucleotides encoding a protein or portions thereof, 
wherein proteins comprise polymorphic variants; and the portions encode 
a domain of the protein that comprises a site in the protein that binds to a 
drug candidates; and 
5 the coordinates of 3-dimensional (3-D) structures of the encoded 

proteins or portions thereof. 

54. The database of claim 53 that is a relational database. 

55. The database of claim 53 that comprises at least 2 
polymorphic variants and the corresponding 3-D structures. 

10 56. The database of claim 55 that comprises at more than 10, 

more than 100, more than 1000, more than 8000, or more than 10,000 
polymorphic variants and the corresponding 3-D structures. 

57. The database of claim 53, wherein the protein is a receptor 
or enzyme from a eukaryotic or prokaryotic organism. 

15 58. The database of claim 53, wherein the organism is a 

pathogen or a mammal. 

59. The database of claim 53, wherein the organism is a 
pathogen is a virus or bacterium and the mammal is a human. 

60. The database of claim 53, wherein the protein is a protease 
20 or a reverse transcriptase. 

61 . A database, comprising the sequences of nucleotides set 
forth in SEQ ID Nos. 3-117 that encode HIV protease or the portion of 
HIV reverse transcriptase set forth in each SEQ ID. 

62. The database of claim 53, further comprising 3-D structural 
25 coordinates for a protein or portion thereof comprising sequences of 

amino acids encoded by each of SEQ ID Nos. 3-117. 

63. The database of claim 54, wherein the protein is HIV 
protease. 
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64. The database of claim 54, wherein the protein is HIV reverse 
transcriptase. 

65. The method of claim 1, wherein the target protein is a 
eukaryotic or prokaryotic protein. 

5 66. The method of claim 1, wherein the target protein is an 

animal protein, a plant protein or a protein from a pathogen. 
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SEQUENCE LISTING 

<110> Kalyanaraman Ramnarayan 
Edward T. Maggio 
P. Patrick Hess 

<120> Use of Computationally Derived Protein Structures of Genetic Polymorphisms 
in Pharmacogenomics for Drug Design and Clinical Applications 

<130> 24737-1906PC 

<14 0> Unassigned 
<141> 2000-11-10 

<150> 09/438,566 
<151> 1999-11-10 

<150> 24737-1906B 
<151> 2000-11-01 

<160> 118 

<170> FastSEQ for Windows Version 4.0 

<210> 1 

<211> 6 

<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Modified Hepatitis C Virus NS3 Protease Inhibitor 
Peptide 



<221> ACETYLATION 
<222> 1 

<221> MOD RES 
<222> 2 

<223> D-glutamic acid 

<221> MOD_RES 
<222> 5 

<223> beta-cyclohexylalanine 
<300> 

<301> Ingallinella, P., Altamura, S., Bianchi, E . , Talia 

<302> Potent Peptide Inhibitors Of Human Hepatitis C Vir 

<303> Biochemistry 

<304> 37 

<305> 25 

<306> 8906-8914 

<307> 1998-06-23 

<400> 1 

Asp Xaa Leu He Xaa Cys 
1 5 

<210> 2 
<211> 6 
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<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Modified Hepatitis C Virus NS3 Protease Inhibitor 
Peptide 

<221> ACETYLAT I ON 
<222> 1 

<221> MOD_RES 
<222> 5 

<223> beta-cyclohexylalanine 
<300> 

<301> Ingallinella, P., Altamura, S., Bianchi, E . , Talia 

<302> Potent Peptide Inhibitors Of Human Hepatitis C Vir 

<303> Biochemistry 

<304> 37 

<305> 25 

<306> 8906-8914 

<307> 1998-06-23 

<400> 2 

Asp Glu Leu lie Xaa Cys 
1 5 

<210> 3 
<211> 1045 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 
<221> CDS 
<222> (0) . . . (297) 
<223> Protease 

<221> CDS 

<222> (298) . . . (1045) 

<223> Portion of Reverse Transcriptase 
<400> 3 

cct cag ate act ctt tgg caa cga ccc cty gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
15 10 15 



ggc caa eta aaa gaa get yta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



96 



tta gaa gaa atg agt tta cca ggg aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 



gga att gga ggt ttt ate aaa gta aga cag tat gat caa ata etc ata 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu He 



192 



50 55 60 



gaa ate tgt gga cat aaa get ata ggc aca gta tta gta gga cct aca 



240 
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Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ttg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttg ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Leu Pro lie Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ' 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 4 32 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gag aat cca tac aat act cca ata ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga aca caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cac ccc gca ggg tta aaa cag aaa aaa tea gta aca ata ctg 624 
He Pro His Pro Ala Gly Leu Lys Gin Lys Lys Ser Val Thr He Leu 
195 * 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa ggc ttc agg . 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Gly Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt aga aat aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Arg Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aac gtg etc cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



gca ata ttt caa agt age atg aca aga aty tta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Arg Xaa Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



caa aat cca gaa ata gtt ate tat caa tac atg gat gat ttg tat gta 864 

Gin Asn Pro Glu He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 

275 280 285 

gga tct gac tta gaa ata gga cag cat aga gca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 

290 295 300 

aga gga cat eta tta aag tgg gga ttt acc aca cca gac aaa aaa cat 960 
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Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aag aac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gag gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cca gca ggg tta aaa aag aat aaa tea ata aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser lie Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta tgt gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Cys Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt gta aac aat gag act cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga ttc acc 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Phe Thr 
245 250 255 

age ata ttc caa tgt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ser lie Phe Gin Cys Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gag ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga caa tat ctg tgg aag tgg gga ttt tgc aca cca gaa caa aar cat 960 
Arg Gin Tyr Leu Trp Lys Trp Gly Phe Cys Thr Pro Glu Gin Lys His 
305 ~ 310 315 320 

cag aaa gaa cct cct ttc ctt tgg atg ggt tat gaa etc cat ccc gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta caa cct ata gtg ctg cca gac aaa ga 1046 
Lys Trp Thr Val Gin Pro He Val Leu Pro Asp Lys 
340 345 

<210> 5 
<211> 1104 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 
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<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1104) 

<223> Portion of HIV Reverse Transcriptase 
<400> 5 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag rta ggg 
Pro Gin lie Thr Leu Trp Gin Arg Pro He Val Thr He Lys Xaa Gly 
15 10 15 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



ata cca cat ccc gca ggg tta aag aag aaa aaa tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 



48 



ggg caa eta agg gaa get eta tta gat aca gga gca gat gat aca ata 96 
Gly Gin Leu Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr He 
20 25 30 

ata gaa gac ata act ttg cca gga aga tgg aca cca aaa atg ata ggg 144 
He Glu Asp He Thr Leu Pro Gly Arg Trp Thr Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt gtc aaa gta aga cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgc gga cat aaa rtt ata agt aca gta ttg gta gga cct aca 240 
Glu He Cys Gly His Lys Xaa He Ser Thr Val Leu Val Gly Pro Thr 
65 * 70 75 80 

cca ata aac ata gtt gga aga aat ctg atg act cag att ggt tgc act 288 
Pro He Asn He Val Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gtc aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 



3 84 



aaa ata aag gca tta gta gaa att tgt mca gaa ctg gaa atg gat gga 432 
Lys He Lys Ala Leu Val Glu He Cys Xaa Glu Leu Glu Met Asp Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat ccg tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aag aac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aac aaa aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



624 
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195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta tgt gaa gac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Cys Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



cag aca gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Thr Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



816 



cag aat cca gaa atg gtc ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Glu Met Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 . 280 285 

gga tct gac tta gaa ata gag cag cat aga aca aaa ata gat gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Asp Glu Leu 
290 295 300 

aga caa tat ctg tgg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Gin Tyr Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 ~ 310 315 320 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 " 360 365 

<210> 6 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 6 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 



48 
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15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 

Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gat atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 

Leu Glu Asp Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta agg cag tat gat caa ata etc ata 192 

Gly lie Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 

65 70 75 80 

cct gtc aac ata att gga agg aat ctg ttg act cag att ggt tgc act 

Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gag gag 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 

130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 " 150 155 160 



180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aag tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac act gca ttt act ata cct agt ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 



288 



gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
ion iqc; 1 Q ft 



624 



672 



720 



768 
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245 250 255 

gca ata ttc caa agt age atg ata aaa ate tta gag cct ttc aga aaa 816 
Ala He Phe Gin Ser Ser Met He Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac atg gtc ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Met Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gga cag cac aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aag tgg gga ttt acc aca cca gac aag aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata aag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Lys Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 7 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS' 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 7 

cct cag ate act ctt tgg caa cga ccc ctt gtc aca ata aar ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr . He Lys lie Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



tta gag gaa atn aat tta cca gga aga tgg aaa cca aaa atg ata ggg 
Leu Glu Glu Xaa Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 



144 
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35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ctt gta 192 
Gly lie Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin lie Leu Val 
50 55 60 

gaa aty tgt gga cat aar get ata ggt aca gta tta gta gga cct aca 240 
Glu Xaa Cys Gly His Lys Ala lie Gly Thr Val Leu Val Gly Pro Thr 
65 * 70 75 80 

ccc gtc aac ata att gga aga aat ctg ttg act caa att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

ccg gga atg gat ggc ccc aaa gtt aaa cat ggc cct ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys His Gly Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aag cct tta gtt gaa att tgt aca gaa atg gga aaa gaa ggg 432 
Lys He Lys Pro Leu Val Glu He Cys Thr Glu Met Gly Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aaa aga act caa gac tty tgg gaa gtc caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc tea ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ser Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc ttg gat gaa gac tta gag 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Leu Glu 
210 " 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
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275 280 285 

gga tea gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg ggg tgg ggg ttt acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Gly Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



50 55 60 

gaa ate tgt gga caa aaa get ata agt aca gta tta gta gga cct aca 
Glu He Cys Gly Gin Lys Ala He Ser Thr Val Leu Val Gly Pro Thr 



912 



960 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca aca aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Thr Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aac tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 8 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 8 

cct cag ate act ctt tgg caa cga ccc cty gtc aca gta aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr Val Lys He Gly 
1 5 10 15 

ggg caa ata aag gaa get yta tta gat aca gga gca gat gat aca gta 
Gly Gin He Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



48 



96 



tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 

Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 

35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta ccc ata 192 

Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Pro He 

r- r- t? n 
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65 70 75 80 

cct gtc aat ata att gga aga aat ctg atg act cag att ggt tgc act 

Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 

85 90 95 



gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 



aga caa cat ctg ttg agg tgg gga tta acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Leu Thr Thr Pro Asp Lys Lys His 



288 



tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 
Lys lie Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gee ata aag aaa aaa ggc agt aac aga tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Gly Ser Asn Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg eta aaa aag aaa aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 " 200 205 



672 



aag tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa ggg tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

age tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa eta 912 
Ser Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 



960 
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305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 

325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gar aaa gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt caa 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 



370 



<210> 9 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) ... (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 9 

cct cag ate act ctt tgg caa cga ccc cty gtc aaa gta aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Lys Val Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 



cct gtc aac ata atw gga aga aat ctg ttg act cag att ggt tgc act 
Pro Val Asn He Xaa Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 



1104 



att tac cca ggg 1116 
He Tyr Pro Gly 



48 



96 



tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 



192 



gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
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100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta at a gaa att tgt aca gag atg gag aag gaa ggg 432 
Lys lie Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly . 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gay ttc tgg gaa gtt car tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt gta aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aag ate tta gar cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tew gac tta gaa ata ggg caa cat aga ata aaa ata gag gaa ctg 912 
Gly Xaa Asp Leu Glu He Gly Gin His Arg He Lys He Glu Glu Leu 
290 295 " 300 



aga cag cat ctg tta agg tgg ggg ttt acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 



960 



cag aaa gaa cct cca ttc ctt tgg atg ggk tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Xaa Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gay age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
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340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

ate tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 10 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 10 

cct cag ate act ctt tgg caa cga ccc etc gtc aca gta aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys He Gly 
15 10 15 

ggg caa ata aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin He Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atw ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Xaa He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gaa ate tgt gga caa aaa get ata agt aca gta tta gta gga cct aca 240 
Glu He Cys Gly Gin Lys Ala He Ser Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aat ata att gga aga aat ctg atg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa taa aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys * Lys 
100 105 HO 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
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130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 

gec ata aag aaa aaa ggc agt aac aga tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Gly Ser Asn Arg Trp Arg Lys Leu Val Asp Phe 
160 * *" 165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg eta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 

att aga tat cag tac aat gtg ctt ccm caa gga tgg aaa ggg tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Xaa Gin Gly Trp Lys Gly Ser Pro 
240 245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac wtr gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Xaa Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

age tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa eta 912 
Ser Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga tta ace aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Leu Thr Thr Pro Asp Lys Lys His 
305 310 315 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
320 325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gag aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt caa 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 



WO 01/35316 



PCT/US00/30863 



-17- 



370 

<210> 11 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . .". (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 11 

cct cag ate act ctt tgg caa cga ccc aty gtt aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aaa raa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Xaa Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata gtg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Val 
35 40 45 

gga att gga ggt ttt gtc aaa gta aga cag tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gag ate tgt ggg cat aaa att ata ggt aca gta tta ata gga cct ace 240 
Glu He Cys Gly His Lys He He Gly Thr Val Leu He Gly Pro Thr 
65 * 70 75 80 

cct gec aac gta att gga aga aat ctg atg act cag ctt ggt tgc act 288 
Pro Ala Asn Val lie Gly Arg Asn Leu Met Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt yet att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Xaa He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa' 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt gca gaa ctg gag aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Ala Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aga att ggg cct gaa aat cca tac aat act cca ata ttt 4 80 

Lys He Ser Arg lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 ~ 150 155 160 

gee ata aag aag aaa aac agt act agg tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
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165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa att caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu lie Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aac aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggg gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa gat age atg aca aaa ate tta gat ccc ttt aga aag 816 
Ala He Phe Gin Asp Ser Met Thr Lys He Leu Asp Pro Phe Arg Lys 
260 265 270 

aaa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Lys Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac yta gaa ata gag cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Xaa Glu He Glu Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga gaa tat ctg tta aag tgg gga ttt ttc aca cca gag caa aaa cat 960 
Arg Glu Tyr Leu Leu Lys Trp Gly Phe Phe Thr Pro Glu Gin Lys His 
305 * 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggc tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtg cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aac tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 12 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 
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<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) ... (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 12 

cct caa ate act ctt tgg car cga ccc tta gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa gec eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

eta gaa gaa atg aat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta agg cag tat gat car ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gag ate tgc ggg tat aaa get gtg ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly Tyr Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 " 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act caa att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



384 



aaa ata aaa gca tta ata gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu He Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 " 150 155 160 

gec ata aag aaa aaa gac ggt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Gly Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta eta 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
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195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat caa gac ttc aga 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Gin Asp Phe Arg 
210 215 220 

aag tat act gca ttc act ata cct agt ata aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser He Asn Asn Glu Thr Pro Gly 

225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



caa aat cca gac atg gtt ate tat caa tat atg gat gat ttg tat gta 
Gin Asn Pro Asp Met Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 



864 



ggc tct gac tta gaa aya ggg cag cat aga rca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu Xaa Gly Gin His Arg Xaa Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt ace aca cca gac aaa aaa cat 960 

Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 

305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata atg ctg cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Met Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag eta gtg gga aaa ttg aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 



att tat gca ggg 
He Tyr Ala Gly 
370 



<210> 13 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



1116 
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<400> 13 

cct cag ate act ctt tgg caa cga ccc aty gtc aac ata aag gta ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Asn He Lys Val Gly 
15 10 15 



48 



ggg caa eta arg gaa get eta ata gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Xaa Glu Ala Leu He Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata gat ttg cca gga aga tgg aga cca aga atg ata ggg 144 
Leu Glu Asp He Asp Leu Pro Gly Arg Trp Arg Pro Arg Met He Gly 
35 40 45 

gga att gga ggt ttt gtc aaa gta aag cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe Val Lys Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 



gaa ata tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct acg 
Glu He Cys Gly His Lys Val lie Gly Thr Val Leu Val Gly Pro Thr 
65 ' 70 75 80 

cct gee aac ata att gga aga aat ctg ttg act cag att ggg tgc act 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aaa 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 J 120 125 



180 185 190 

ata ccg cat ccc gca ggg tta ara aag aaa aga tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Xaa Lys Lys Arg Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gec ttt ace ata cct agt ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



240 



288 



336 



384 



aag ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 " 150 155 160 

aaa aac agt act aga tgg aga aaa tta gta gat 
Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp 
165 170 175 



gee ata aag aag aaa aac agt act aga tgg aga aaa tta gta gat ttt 528 
Ala lie Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttt tgt gaa gtg caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Cys Glu Val Gin Leu Gly 

« ^ '* n or 1 on 



624 



672 



720 
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att aga tat cag tat aat gtg ctt cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate eta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca grc ata gtt ate gtt caa tac gtg gat gat ttg tat gta 
Gin Asn Pro Xaa He Val He Val Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 



aga gaa cat ctg ttg agg tgg gga tty ttc aca cca gac gaa aaa cat 
Arg Glu His Leu Leu Arg Trp Gly Phe Phe Thr Pro Asp Glu Lys His 
305 310 315 320 



gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 365 

att tac tea ggg 
He Tyr Ser Gly 
370 



<210> 14 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



768 



816 



864 



ggg tct gac tta gaa ata ggg caa cat aga gca aaa ata gag gag ttg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys lie Glu Glu Leu 
290 295 300 



960 



cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cac cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg ace gta cag cct ata aat ttg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Asn Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



1104 



1116 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 14 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 

Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys lie Gly 

1 5 10 15 

ggg caa gta agg gaa get eta tta gat aca gga gca gat gat aca gta 

Gly Gin Val Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 
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tta gaa gaa atg aat ttg cca gga aaa tgg aag cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 



gga att ggg ggc ttt ate aaa gta aga cag tat gat caa ata ccc ata 

Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggg aca gtg tta ata gga cct aca 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 

65 " * 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 

Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 



att aga tat cag tac aat gtg ctg cca caa gga tgg aaa gga tea cca 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 

tea ata ttc caa agt agy atg aca aaa ate tta gag cct ttt aga aag 

Ser He Phe Gin Ser Xaa Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



192 



240 



288 



tta aat ttt cct att agt cct att gaa act gtg cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggt cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca ttg ata gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa aac agt act aga tgg agg aaa eta gta gac ttc 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



480 



528 



aga gaa ctt aat aag aga act caa gac ttt tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca gga tta aaa aag aga aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Arg Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aar gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aaa tac act gca ttc ace ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



768 



816 
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caa aat cca gac ata gat ate tgt caa tac atg gat gat ttg tat gta 864 

Gin Asn Pro Asp lie Asp lie Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata rag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Xaa Glu Leu 
290 295 300 

aga gag cat ctg eta aag tgg gga ttt ace aca cca gac raa aaa cat 960 

Arg Glu His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Xaa Lys His 

305 310 315 320 

car aaa gaa cct cca ttt ctt tgg atg ggt tat gaa ctt cat cct gat 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta caa cat ata gag eta cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Val Gin His He Glu Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



gtc aat gac ata caa aag tta gtg gga aaa tta aat tgg gca agt cag 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 



1008 



1104 



att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 15 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 15 

cct caa ate act ctt tgg car cga ccc etc gtt gca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Ala He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta kaa gaa atg gat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Xaa Glu Met Asp Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta tec wta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Ser Xaa 
50 * 55 60 
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gaa ate tgt gga cat aaa get ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu lie Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ~ 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tat aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa ttg gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gar gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat cca gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Pro Asp Phe Arg 
210 " 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca gga 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

ate aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa agy age atg ata aga aty tta gag cct ttt aga aaa 816 
Ala He Phe Gin Xaa Ser Met He Arg Xaa Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gat tta gaa ata gaa cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 
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aga caa cat ctg tta agg tgg gga ttt acc aca cca gay aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggk tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Xaa Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag ttr gtg gga aaa ttr aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Xaa Val Gly Lys Xaa Asn Trp Ala Ser Gin 
355 360 365 

att tac tea ggg 1116 
He Tyr Ser Gly 
370 

<210> 16 
<211> 1116 
<212> DNA 

<213> Human Intmunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 16 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gag get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg act ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Thr Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 ~ 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 " 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta rta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Xaa Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aar gat ggt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Gly Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa att caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu He Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aag tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 " * 215 220 

aag tat act gca ttt act ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tgt age atg aca aaa ate tta gag cct ttt aga aag 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga rca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Xaa Lys He Glu Glu Leu 
290 295 300 . 

agg caa cat ctg ttg aag tgg gga ttt ace aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cca gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 
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aaa tgg aca gta cag cct ata gtg ctg cca caa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Gin Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 17 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 17 

cct caa ate act ctt tgg caa cga ccc aty gtc aca ata aag gta ggg 4 8 

Pro Gin lie Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys Val Gly 
1 5 10 15 

ggg caa eta aag gaa gee eta ata gat aca gga gca gat gat aca gtg 96 
Gly Gin Leu Lys Glu Ala Leu He Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ttg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Leu He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag rta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Xaa Pro He 
50 55 60 



gaa ate tgt gga cat aaa get gta ggt tea gtg tta gta gga cct aca 
Glu He Cys Gly His Lys Ala Val Gly Ser Val Leu Val Gly Pro Thr 
65 70 75 80 



240 



cct gee aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

eta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca aaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Lys Glu 
115 120 125 
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aaa ata gaa gca tta gta gaa ate tgt gca gaa ctg gaa gag gca ggg 432 
Lys He Glu Ala Leu Val Glu He Cys Ala Glu Leu Glu Glu Ala Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aar aag aac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



aga gaa ctt aac aag aga act caa gac ttc tgg gaa gtt caa tta gga 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



aga caa yat ctg tgg aag tgg gga ttt tac aca cca gag aat aaa cat 
Arg Gin Xaa Leu Trp Lys Trp Gly Phe Tyr Thr Pro Glu Asn Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc cwt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Xaa Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



576 



ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea att ccc tta gat aag gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Lys Asp Phe Arg 
210 ' 215 220 

aag tat act gca ttt aca ata cct agy ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Xaa lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cma cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Xaa Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc cag tgt age atg aca aaa ate tta gat cct ttt aga aaa 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Asp Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg car cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 



960 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa tta gtg gga aaa ttg aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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cct gcc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 * 90 95 



cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 1 120 125 



gey ata cac aag aaa aat agt aat aga tgg aga aaa gta gta gat ttc 
Xaa He His Lys Lys Asn Ser Asn Arg Trp Arg Lys Val Val Asp Phe 
165 170 175 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro lie Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 



384 



aaa ata aaa gca tta aca gaa ate tgt wca gag atg gaa aag gaa ggg 432 

Lys He Lys Ala Leu Thr Glu He Cys Xaa Glu Met Glu Lys Glu Gly 

130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aac act cca gta ttt 480 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 150 155 160 



528 



agg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 

180 185 190 

ata cca cat ccc gca gga tta aaa aag aac aaa tea gta aca gta ctg 624 

lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 

195 " 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aag gat ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 

210 215 220 

aag tat act gcg ttt acc ata cct agt ata aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 

225 230 235 240 

ate aga tac cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aga ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Ser Ser Met Thr Arg He Leu Glu Pro Phe Arg Lys 

260 265 270 

caa aat cca gaa ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 

Gin Asn Pro Glu He Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 

275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata aak gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Xaa Glu Leu 

290 295 300 

aga saa cat ctg ttg agg tgg gga ttt ttc aca cca gac caa aaa cat 960 

Arg Xaa His Leu Leu Arg Trp Gly Phe Phe Thr Pro Asp Gin Lys His 

305 310 315 320 
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att tat gen ggg 1116 
He Tyr Ala Gly 
370 

<210> 18 
<211> 1117 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1117) 

<223> Portion of HIV Reverse Transcriptase 



<400> 18 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 

Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 

15 10 15 

ggg car eta aag gaa get eta tta gat aca gga gca gat gat aca gta 

Gly Gin Leu Lys Glu Ala Leu Leu Asp. Thr Gly Ala Asp Asp Thr Val 

20 25 30 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 



48 



96 



gta gaa gaa atg aat tta tea gga agg tgg aaa cca aaa atg ata ggg 144 
Val Glu Glu Met Asn Leu Ser Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga saa tat gaa cag ata cct gta 192 
Gly He Gly Gly Phe He Lys Val Arg Xaa Tyr Glu Gin He Pro Val 
50 55 60 

gaa att tgt gga cat aaa get gta ggt aca gta tta gtg gga cct aca 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt ccc att gaa act gta cca gta aaa ttg aag 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc ccg aga gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



240 



288 



336 



384 



480 
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gcc ata aag aaa aaa gac agt aat aaa tgg agg aaa tta gtg gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Asn Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta ggg 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp. Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccy tea ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Xaa Ser Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tac ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 



aag tat act gca ttt ace ata cct agt aca aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 



aga caa cat ctg ttg agg tgg gga yta acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Xaa Thr Thr Pro Asp Lys Lys His 
305 310 315 320 



720 



att agr tat cag tac aat gtg ctg cca cag gga tgg aaa gga tea cca 768 

He Xaa Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga gaa 816 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Glu 
260 265 270 

caa aat aca gac ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 

Gin Asn Thr Asp He Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa gtr gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys Xaa Glu Glu Leu 
290 295 300 



960 



cag aaa gaa cct cca ttc cgt tgg atg ggk tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Arg Trp Met Xaa Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtr caa cct ata gag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Xaa Gin Pro He Glu Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata caa aaa gtt agt ggg aaa att aaa ttg ggc aag tea 1104 
Val Asn Asp He Gin Lys Val Ser Gly Lys He Lys Leu Gly Lys Ser 
355 360 365 

gat tta ccc agg g 1117 
Asp Leu Pro Arg . 
370 



<210> 19 
<211> 1116 



WO 01/35316 



PCT/US00/30863 



-32- 



<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 19 

cct cag ate act ctt tgg caa cga ccc cty gtc aca gta aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Xaa Val Thr Val Lys lie Gly 
15 10 15 

ggg caa eta acg gaa get yta ttg gat aca gga gca gat aat aca gta 96 
Gly Gin Leu Thr Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asn Thr Val 
20 25 30 

tta gaa gaa atg agt ttr cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Xaa Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 " 55 60 

gaa ate tgt gga cat aaa gta gta ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Val Val Gly Thr Val Leu He Gly Pro Thr 
65 ' 70 75 80 

cct gtc aac ata att gga aga gat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asp Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gtg aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aar gac agt act aaa tgg aga aaa ttr gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Xaa Val Asp Phe 
165 170 175 

aga gaa ctt aat aaa aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 
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ata cca cat ccc tea ggg tta aaa aag aaa aaa tea gta aca gta eta 
lie Pro His Pro Ser Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 



624 



gac gtg ggt gat gca tat ttc tea gtt ccc eta gat aaa gaa ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 ' 215 220 

aag tat act gca ttc ace ata cct agt gta aac aat gag act cca ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctg cca cag gga tgg aaa gga tea cca 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

cac aat cca aac ata gtt ate tat caa tac gtg gat gat tta tat gta 864 

His Asn Pro Asn He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa gta gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys Val Glu Glu Leu 
290 295 300 



aga caa cat ctg ttg aag tgg ggg ttt tac aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



768 



816 



960 



1008 



aaa tgg aca gtg cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata caa aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 20 
<211> 1117 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 
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<221> CDS 

<222> (298) . . . (1117) 

<223> Portion of HIV Reverse Transcriptase 
<400> 20 

cct cag ate act ctt tgg caa cga ccc etc gtc aca at a aag ata gga 4 8 

Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata aat ttg cca ggg aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp He Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat caa ata cca gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro Val 
50 55 60 

gaa att tgt gga cat aaa get gta ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu He Gly Pro Thr 
65 " * 70 75 80 



cct gtc aac gta att gga aga aat ctg atg act cag att ggc tgc act 
Pro Val Asn Val He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 



cca gga atg gat ggt cca aaa gtt aaa caa tgg cca tta aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 



384 



aaa ata aaa gca tta gta gaa att tgc aca gaa ttg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att gga cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea ata ccc tta gat gaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Glu Glu Phe Arg 
210 ~ 215 220 
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aag tat act gca ttt acc ata cct agt cca aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser Pro Asn Asn Glu Thr Pro Gly 

225 230 235 240 

att aga tat caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 

gca ata ttt caa tgt agt atg aca aaa ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 

260 265 270 

gaa aat cca gat ata gtt ate tac caa tac atg gat gac tta tat gta 864 

Glu Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 

275 280 285 

gga tct gat tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 

290 295 300 

aga caa tat ctg tgg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 

Arg Gin Tyr Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 

305 " 310 315 320 

cag caa gaa cct cca ttc cgt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Gin Glu Pro Pro Phe Arg Trp Met Gly Tyr Glu Leu His Pro Asp 

325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 

340 345 350 

gtc aat gac ata cag aag ttt agt ggg aaa att gaa ttg ggc aag tea 1104 

Val Asn Asp He Gin Lys Phe Ser Gly Lys He Glu Leu Gly Lys Ser 

355 360 365 

gat tta tgc agg g 1117 
Asp Leu Cys Arg 
370 



<210> 21 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 21 

cct cag ate act ctt tgg caa cga mcc gtt gtc wca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Xaa Val Val Xaa He Lys He Gly 
15 10 15 
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ggg caa eta aaa gaa get eta tta gay aca ggg gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg cat ttg cca ggt aga tgg aaa cca aaa atg ata gtg 144 
Leu Glu Asp Met His Leu Pro Gly Arg Trp Lys Pro Lys Met He Val 
35 40 45 

gga att ggg ggt ttt gtc aaa gta aga cag tat gat cag ata cct gta 192 
Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin He Pro Val 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 ' 70 75 80 

cca gee aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttc ccc ate agt cct att gaa act gta cca gta aaa tta aag 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa att aga caa tgg cca tta aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys He Arg Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 

aaa ata aaa gca tta gta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa aat agt act aaa tgg aga aaa tta gta gat ttc 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta eta 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 



aag tat act gca ttt ace ata cct agt atg aac aat gag aca cca gga 
Lys Tyr Thr Ala Phe Thr He Pro Ser Met Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca atg gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Met Gly Trp Lys Gly Ser Pro 
245 250 255 



288 



336 



384 



480 



528 



aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 190 



624 



gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 ~ 215 220 



720 



768 
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gca ata ttc caa agt agt atg aca aaa ate tta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



cag aat cca gac ata gtc ate tat caa tac atg gat gat tta tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga teg gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ttg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aga tgg gga ttt acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtg cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtt aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt caa 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 ' 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 22 
<211> 1116 
<212> DNA 



<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 22 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag gta gga 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys Val Gly 
1 5 10 15 

ggg caa eta aag gag get eta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



48 



96 



tta gaa gac ata gat ttg cca gga agr tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp He Asp Leu Pro Gly Xaa Trp Lys Pro Lys Met He Gly 
35 40 45 
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gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ata tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act egg att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Arg He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttt 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gat ttc tgg gaa gtg caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata ccg cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aar gay ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gec ttt ace ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tat aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate eta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac atg gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Met Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 



WO 01/35316 



PCTYUS00/30863 



-39- 



ggg tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg ttg agg tgg gga ttt acc acc cca gac aaa aaa cat 960 
Arg Glu His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gag cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg acc gtr cag cct ata gag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Xaa Gin Pro He Glu Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 " 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 23 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 23 

cct cag ate act ctt tgg caa cga ccc ata gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta ata gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu He Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata aat ttg cca gga aga tgg aaa cca aaa tta ata ggg 144 
Leu Glu Asp He Asn Leu Pro Gly Arg Trp Lys Pro Lys Leu He Gly 
35 40 45 

gga att gga ggt ttt gtc aga gtg aaa cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe Val Arg Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa att tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 
65 70 15 80 
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cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aar gac agt tgg acw 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Xaa 
340 345 350 

gty aat gac ata cag aaa tta gtk gga aaa ttg aat tgg gca agt caa 1104 
Xaa Asn Asp lie Gin Lys Leu Xaa Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 24 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 24 

cct cag ate act ctt tgg caa cga ccc ata gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag, gaa get eta ata gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu He Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata aat ttg cca gga aga tgg aaa cca aaa tta ata ggg 144 
Leu Glu Asp He Asn Leu Pro Gly Arg Trp Lys Pro Lys Leu He Gly 
35 40 45 

gga att gga ggt ttt gtc aga gtg aaa cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe Val Arg Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa att tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 



65 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag att ggt tgc act 

Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 
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cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 - 120 125 

aaa ata aaa gca tta aca gaa ate tgt wca gag atg gaa aag gaa ggg 432 
Lys lie Lys Ala Leu Thr Glu He Cys Xaa Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aac act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gey ata cac aag aaa aat agt aat aga tgg aga aaa gta gta gat ttc 528 
Xaa He His Lys Lys Asn Ser Asn Arg Trp Arg Lys Val Val Asp Phe 
165 170 175 

agg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca gga tta aaa aag aac aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aag gat ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gcg ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

ate aga tac cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aga ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Arg lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 



gga tct gac tta gaa ata ggg cag cat aga aca aaa ata aak gaa ctg 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Xaa Glu Leu 
290 295 300 



912 



aga saa cat ctg ttg agg tgg gga ttt ttc aca cca gac caa aaa cat 960 
Arg Xaa His Leu Leu Arg Trp Gly Phe Phe Thr Pro Asp Gin Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aar gac agt tgg acw 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Xaa 
340 345 350 
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gty aat gac at a cag aaa tta gtk gga aaa ttg aat tgg gca agt caa 1104 
Xaa Asn Asp lie Gin Lys Leu Xaa Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 25 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 25 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aaa ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
15 10 15 

ggg caa eta aag gaa get eta eta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg agt ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta tec atg 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Ser Met 
50 55 60 

gaa ate tgt gga cat aaa gtt ata ggt aca gta tta gta gga tct aca 240 
Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Ser Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ytg ttg act cag ctt ggg tgc act 288 
Pro Val Asn He He Gly Arg Asn Xaa Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta ata gaa att tgt aca gaa atg gaa aag gar ggg 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 
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aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 

Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aag tta gta gat ttc 528 

Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gat ttc tgg gaa rtt caa tta gga 576 

Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Xaa Gin Leu Gly 
180 ~ 185 190 

ata cca cat ccc gca ggg tta caa aag aac aaa tea gta aca gta ctg 624 

lie Pro His Pro Ala Gly Leu Gin Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtc ccc tta gat aaa gac ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 

210 A 215 220 

aag tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 

225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



gca ata ttc caa tat age atg aca aaa ate tta gag cct ttt aga aaa 
Ala lie Phe Gin Tyr Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



caa aat cca gac ata gtt ate tac caa tac gtg gat gat ttg tat gta 864 

Gin Asn Pro Asp lie Val lie Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gaa cag cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu lie Glu Gin His Arg Thr Lys lie Glu Glu Leu 

290 295 300 

aga cag cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aaa cat 960 

Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc etc tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtt cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 

Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 
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<210> 26 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 26 

cct cag ate act ctt tgg caa cga ccc ate gtc gaa ata aag gta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Glu He Lys Val Gly 
15 10 15 

ggg caa eta ata gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu He Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa ata aat tta cca gga aga tgg aaa cca aga atg ata ggg 144 
Leu Glu Glu He Asn Leu Pro Gly Arg Trp Lys Pro Arg Met He Gly 
35 40 45 

gga att gga ggt ttt gtc aaa gta aga cag tat gat cag gta cct ate 192 
Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gaa ate tgt gga cat aaa gtt ata agt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Val lie Ser Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gee aac ata att gga aga aat ctg atg act cag att ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aaa 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ytg gaa gag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Xaa Glu Glu Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca ata ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 * 150 155 160 

gee ata aag aag aaa nnn agt ggt aga tgg aga aaa ata gta gat ttt 528 
Ala He Lys Lys Lys Xaa Ser Gly Arg Trp Arg Lys He Val Asp Phe 
165 170 175 
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aga gaa ctt aat aag aga act caa gat ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aac aag tea gta aca att ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr He Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aag gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aat aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat cag tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gat tta gaa ata ggg gag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Glu His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga car cat ctg tta arg tgg gga ttt ttc aca cca gaa caa aaa cat 960 
Arg Gin His Leu Leu Xaa Trp Gly Phe Phe Thr Pro Glu Gin Lys His 
305 310 315 320 

cag aaa gaa cct ccm ttc cak tgg atg ggt tat gaa etc cay cct gat 1008 
Gin Lys Glu Pro Xaa Phe Xaa Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cas cct ata gtg ctg cca gaa aaa gat age tgg act 1056 
Lys Trp Thr Val Xaa Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 27 
<211> 1113 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 
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.47. 



<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 27 

cct cag ate act ctt tgg caa cga ccc ate gtc gaa ata aag gta ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Glu He Lys Val Gly 
15 10 15 



48 



ggg caa eta ata gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu He Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa ata aat tta cca gga aga tgg aaa cca aga atg ata ggg 144 
Leu Glu Glu He Asn Leu Pro Gly Arg Trp Lys Pro Arg Met He Gly 
35 40 45 

gga att gga ggt ttt gtc aaa gta aga cag tat gat cag gta cct ate 192 
Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gaa ate tgt gga cat aaa gtt ata agt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Val He Ser Thr Val Leu Val Gly Pro Thr 
65 * 70 75 80 

cct gee aac ata att gga aga aat ctg atg act cag att ggt tgc act 288 
Pro Ala Asn lie lie Gly Arg Asn Leu Met Thr Gin lie Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aaa 336 
Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys % 
100 105 110 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ytg gaa gag gaa ggg 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Xaa Glu Glu Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca ata ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 ' 150 155 160 



gee ata aag aag aaa agt ggt aga tgg aga aaa ata gta gat ttt aga 
Ala lie Lys Lys Lys Ser Gly Arg Trp Arg Lys lie Val Asp Phe Arg 
165 170 175 



528 



gaa ctt aat aag aga act caa gat ttc tgg gaa gtt caa tta gga ata 576 
Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly He 
180 " 185 190 

cca cat ccc gca ggg tta aaa aag aac aag tea gta aca att ctg gat 624 
Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr lie Leu Asp 
195 * 200 205 
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gtg ggt gat gca tat ttt tea gtt ccc tta gat aag gaa ttc agg aag 672 
Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg Lys 
210 215 220 

tat act gca ttt acc ata cct agt ata aat aat gag aca cca ggg att 720 
Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly He 
225 230 235 240 

aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca gca 768 
Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro Ala 
245 250 255 

ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa caa 816 
He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys Gin 
260 265 270 

aat cca gac ata gtt ate tat cag tac gtg gat gat ttg tat gta gga 864 
Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val Gly 
275 280 285 

tct gat tta gaa ata ggg gag cat aga aca aaa ata gag gaa ctg aga 912 
Ser Asp Leu Glu He Gly Glu His Arg Thr Lys He Glu Glu Leu Arg 
290 295 300 

car cat ctg tta arg tgg gga ttt ttc aca cca gaa caa aaa cat cag 960 
Gin His Leu Leu Xaa Trp Gly Phe Phe Thr Pro Glu Gin Lys His Gin 
305 310 315 320 

aaa gaa cct ccm ttc cak tgg atg ggt tat gaa etc cay cct gat aaa 1008 
Lys Glu Pro Xaa Phe Xaa Trp Met Gly Tyr Glu Leu His Pro Asp Lys 
325 330 335 

tgg aca gta cas cct ata gtg ctg cca gaa aaa gat age tgg act gtc 1056 
Trp Thr Val Xaa Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr Val 
340 345 350 

aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag att 1104 
Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin He 
355 360 365 

tac cca ggg 1113 
Tyr Pro Gly 
370 

<210> 28 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
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<400> 28 

cct caa ate act stt tgg caa cga ccc aty gtc tea ata aag ata ggg 48 

Pro Gin He Thr Xaa Trp Gin Arg Pro Xaa Val Ser He Lys He Gly 
1 5 10 15 

ggg caa ata aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin He Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aag cca aaa atg ata gtg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Val 
35 40 45 

gga att gga ggt ttt age aaa gta aga caa tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe Ser Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 ' 55 * 60 

gaa ate tgc gga cgt aaa gtt gta ggt tea gta tta ata gga cct aca 24 0 

Glu He Cys Gly Arg Lys Val Val Gly Ser Val Leu lie Gly Pro Thr 
65 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag ctt ggc tgt act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct atk gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro Xaa Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca aaa gag 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Lys Glu 
115 120 125 

aaa ata aaa gca tta ata gaa att tgt aca gaa ttg gaa gaa gma gga 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Leu Glu Glu Xaa Gly 
130 135 140 

aaa att aca aaa att ggg cct gaa aat ccg tac aat act cca ata ttt 480 
Lys He Thr Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 ' 150 155 160 

gee ata aag aaa aar aac agt act aaa tgg aga aaa tta gta gac ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

agg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aar tat act gca ttt ace ata cct agt acg aat aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 
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att aga tat cag tat aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg ctt gta 864 
Gin Asn Pro Asp He Val lie Tyr Gin Tyr Val Asp Asp Leu Leu Val 
275 280 285 

gga tct gac tta gaa ata gag cag cat aga aca aaa ata gag gag eta 912 
Gly Ser Asp Leu Glu lie Glu Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gka cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Xaa Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata caa aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg 1116 
lie Tyr Pro Gly 
370 



<210> 29 
<211> 1116 
<212> DNA 



<Z±Z> DNA 

<213> Human Immunodif iciency Virus 



(HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 29 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro lie Val Thr He Lys lie Gly 
15 10 15 

ggg caa eta aag gaa get eta ata gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu He Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 
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tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 

Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 

35 40 45 

gga att gga ggt ctt gtc aaa gta aga cag tat gat cag ata ccc ata 192 

Gly He Gly Gly Leu Val Lys Val Arg Gin Tyr Asp Gin He Pro He 



50 55 60 

gaa ate tgt gga cat aaa gtt ata ggt aca gtw tta gta gga cct aca 

Glu He Cys Gly His Lys Val He Gly Thr Xaa Leu Val Gly Pro Thr 

65 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 

Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 

85 90 95 



cca ggg atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ' 120 125 



aag att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aag aac agt act agg tgg aga aaa tta gta gat ttc 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



aag tat act gca ttt acy ata cct agt ata aac aat gaa aca cca ggg 
Lys Tyr Thr Ala Phe Xaa He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



240 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 



384 



aaa ata aaa gca tta gta gaa att tgt aca gaa atg gag aag gag gga 432 
Lys He Lys Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



480 



528 



aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aac aaa tea gca aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Ala Thr Val Leu 
195 200 205 

gat gtg ggc gat gca tat ttt tea gtt ccc tta gac aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 



720 



tar ata tea gtg tac aat gtr ctt cca caa gga tgg aaa gga tea cma 768 
Xaa lie Ser Val Tyr Asn Xaa Leu Pro Gin Gly Trp Lys Gly Ser Xaa 
245 250 255 

gca ata ttc maa agt age atg aca aga ate tta gag cct ttt aga aaa 816 
Ala He Phe Xaa Ser Ser Met Thr Arg He Leu Glu Pro Phe Arg Lys 
260 265 270 
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caa aat cca gaa ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 

Gin Asn Pro Glu He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa gta gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys Val Glu Glu Leu 
290 295 300 



aga caa cat ctg ttg agg tgg gga ttt ttc aca cca gac caa aaa cat 

Arg Gin His Leu Leu Arg Trp Gly Phe Phe Thr Pro Asp Gin Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 

325 330 335 



960 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac gen ggg 1116 
He Tyr Ala Gly 
370 

<210> 30 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 30 

cct caa ate act ctt tgg caa cga ccc cty gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg age tta cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 



gga att gga ggk ttt ate aaa gtg agm cag tat gat cag ata etc ata 
Gly He Gly Xaa Phe He Lys Val Xaa Gin Tyr Asp Gin He Leu He 
50 55 60 



192 
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gaa aty tgt gga cat aaa get ata ggt aca gtr tta ata gga cct aca 240 
Glu Xaa Cys Gly His Lys Ala He Gly Thr Xaa Leu He Gly Pro Thr 
65 70 75 80 



cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 



cca gga atg gat ggc cca aaa gtc aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa ttg aaa 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 



384 



aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggr 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Xaa 
130 135 140 

aaa att aca aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Thr Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aag aaa aac agt gat aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Asp Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 190 

ata cca cat cca gca ggg tta aaa cag aaa aag tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Gin Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gta ccc tta gat gaa gac ttc agg 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 * ~ 215 220 

aag tat act gca ttt ace ata cct agt gta aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

gtt aga tat cag tac aat gta etc cca cag gga tgg aaa gga tea cca 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa agt age atg aca aaa ate tta gag cct ttt agg aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 



672 



720 



768 



816 
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aga caa cat ctg ttg agg tgg gga ttc tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gta ggg aaa tta aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca gga 1116 
He Tyr Ala Gly 
370 

<210> 31 
<211> 1117 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 31 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa tta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

eta gaa gac gtg cat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Val His Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat gag gta ccc ata 192 
Gly He Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Glu Val Pro He 
50 55 . 60 

gaa etc tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu Leu Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

ccc gtc aac ata att gga aga aat ctg wtg act caa ctt ggg tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Xaa Thr Gin Leu Gly Cys Thr 
85 90 95 
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cta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aga gtt ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Arg Val Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 ~ 150 155 160 

gyc ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Xaa He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cay ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctr 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Xaa 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tac cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gat cct ttt agg aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Asp Pro Phe Arg Lys 
260 265 270 

caa aac cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tcy gac tta gaa ata gga cag cat agr rca aaa ata gaa gaa ctg 912 
Gly Xaa Asp Leu Glu He Gly Gin His Xaa Xaa Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aag tgg gga ttt acc aca cca gac aag aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

car aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 
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aaa tgg aca gtg cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ant aca gaa gtt agt ggg aaa att gaa ttg ggc aag tea 1104 
Val Asn Asp Xaa Thr Glu Val Ser Gly Lys lie Glu Leu Gly Lys Ser 
355 360 365 

gat tta tgc agg g 1117 
Asp Leu Cys Arg 
370 

<210> 32 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 32 

cct caa ate act ctt tgg caa cga ccc cty gtc gca ata agg ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Xaa Val Ala He Arg He Gly 
1 5 10 15 

ggg caa eta aag gaa gee eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg gag ttg cca gga aga tgg aag cca aaa atg ata ggg 144 
Leu Glu Asp Met Glu Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aam cag tat gat cag ata ctt gta 192 
Gly He Gly Gly Phe He Lys Val Xaa Gin Tyr Asp Gin He Leu Val 
50 55 60 

gaa ate tgt gga cat aaa get gta ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ttg ttg act cag att ggc tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gag 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 *" 125 



384 
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aaa ata aaa gca tta gta gaa ate tgt aca gaa ttg gaa aag gaa gga 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aaa aga act caa gac ttt tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tec gtg aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttt aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace aya cct sgt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr Xaa Pro Xaa He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tec cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa age age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac wta gtt wtc tat caa twe ata gat gat ctg tat gta 864 
Gin Asn Pro Asp Xaa Val Xaa Tyr Gin Xaa He Asp Asp Leu Tyr Val 
275 280 285 

ggc tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga cag cat ctg tgg aag tgg ggg ttt tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 ~ 310 315 ~ 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata atg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Met Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 - * 350 

gtc aat gac ata cag aar tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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att tac cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 33 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) - . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 33 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta kat aca gga gca gat gat aca gtm 96 
Gly Gin Leu Lys Glu Ala Leu Leu Xaa Thr Gly Ala Asp Asp Thr Xaa 
20 25 30 

tta gaa gac atg act ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Thr Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gag gag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Glu Glu He Pro He 
50 " 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 " " 70 75 80 

cct gtc aac ata att gga aga aat ttg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 1 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aaa 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca ttw gta gaa att tgt gca gaa ctg gaa aag gaa ggg 432 
Lys He Lys Ala Xaa Val Glu He Cys Ala Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 
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gcc ata aag aaa aaa gac ggt act aaa tgg aga aag gta aca gat ttt 528 
Ala He Lys Lys Lys Asp Gly Thr Lys Trp Arg Lys Val Thr Asp Phe 
165 170 175 

aga gaa ctt aat aag agg ach caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Xaa Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 " 185 190 

ata cca cat ccc tea ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ser Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 ' 200 ' 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt ata aac aat gcg aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser lie Asn Asn Ala Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac atg gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Met Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gat tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aag tgg ggt ttt ace aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat tea ggg 1116 
He Tyr Ser Gly 
370 



<210> 34 
<211> 1119 
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<2X2> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 34 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro lie Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta ttr gac aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Xaa Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 
35 40 45 

gga att gga ggt ttt att aaa gta aaa cag tat gaa cag ata ace ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Glu Gin He Thr He 
50 55 60 

gam ate tgt gga cat aaa get aca ggt aca gta tta gta gga cct aca 240 
Xaa He Cys Gly His Lys Ala Thr Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac gta att gga aga aat atg atg act cag att ggt tgc act 288 
Pro Val Asn Val He Gly Arg Asn Met Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aac aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 
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ata cca cat ccc gca ggg tta cca aag aac aaa tea gta acg gta ctg 624 
lie Pro His Pro Ala Gly Leu Pro Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt cct tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tac act gca ttt ace ata cct agg tat aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Arg Tyr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

act aga tat cag tac aat gtg ctt cct atg gga tgg aaa gga tea cca 768 
Thr Arg Tyr Gin Tyr Asn Val Leu Pro Met Gly Trp Lys Gly Ser Pro 
245 250 * 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aga 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Arg 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gac ttg tat gta 864 
Gin Asn Pro Asp lie Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gag ata ggg cag cat aga gcg aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg tgg aag tgg ggt ttt tac aca cca gac aaa aaa cat 960 
Arg Glu His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc cat tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe His Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa tta gtg ggr aaa att gaa ttt ggg cga gtc 1104 
Val Asn Asp He Gin Lys Leu Val Xaa Lys He Glu Phe Gly Arg Val 
355 360 365 

aga ttt amc caa ggg 1119 
Arg Phe Xaa Gin Gly 
370 

<210> 35 
<211> 1115 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



WO 01/35316 



PCT/US00/30863 



-62- 



<221> CDS 

<222> (298) . . . (1115) 

<223> Portion of HIV Reverse Transcriptase 
<400> 35 

cct cag ate act ctt tgg caa cga ccc cty gtc cca ata arg ata ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Pro He Xaa He Gly 
1 5 10 15 

ggg caa tta aag gaa get eta eta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg aat tta cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aar gta aaa cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt ggg cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

eta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * * " 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att gga cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 " 150 155 160 

gee ata aag aaa aag gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttt tgg gaa gtc caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta tta 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg gga gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 
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aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr lie Pro Ser He Asn Asn Glu Thr Pro Gly 



225 



230 235 240 



290 295 300 

aga caa cac ttg ttg maa tgg gga ttc acc aca cca gac aaa aag cat 
Arg Gin His Leu Leu Xaa Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata kaa ctg cca gaa aaa gac age tgg ctg 
Lys Trp Thr Val Gin Pro He Xaa Leu Pro Glu Lys Asp Ser Trp Leu 
340 345 350 



ttt atg cng gg 
Phe Met Xaa 



<210> 36 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



720 



768 



att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aag 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtc ata tat caa tac atg gat gat ttg tat gta 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

ggg tct gac tta gaa ata gga cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 

_ _ _ inr OrtA 



816 



864 



960 



1008 



1056 



tea atg aca tac aga aat tag tgg gaa agt tga att ggg caa gtc aaa 1104 
Ser Met Thr Tyr Arg Asn * Trp Glu Ser * He Gly Gin Val Lys 
355 360 365 



1115 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 36 

cct cag ate act ctt tgg caa cga cca gtc gtc aca ata aag gta ggg 

Pro Gin He Thr Leu Trp Gin Arg Pro Val Val Thr He Lys Val Gly 

I 5 ~ 10 15 
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ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt rtc aaa gta aga cag tat gat caa ata ccc ata 192 
Gly He Gly Gly Phe Xaa Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa get aca ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala Thr Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gyc aac ata att gga aga aat ctg ttg act cag att ggg tgc act 288 
Pro Xaa Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ctg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt gca gaa ttg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Ala Glu Leu Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg ccy gaa aat cca tac aay act cca gta ttt 4 80 

Lys He Ser Lys He Gly Xaa Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aar aac agt act ara tgg aga aaa kta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Xaa Trp Arg Lys Xaa Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gat ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg eta aag aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc ttg gat gaa gac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 * 215 220 

aag tat aca gee ttt ace tat act ggt tec aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr Tyr Thr Gly Ser Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat car tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 
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gca ata ttc caa age age atg aca aaa gtc tea gaa cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys Val Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val lie Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga caa cat ctg tta agg tgg gga ttt tac aca cca gac gaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Tyr Thr Pro Asp Glu Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gac 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtt aat gac ata cag aaa tta gtg gga aaa ttg aat tgg gec agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 37 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 37 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aaa ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 
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aag tat act gca ttt acc ata cct agt gta aac aat gag aca cca kgg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Val Asn Asn Glu Thr Pro Xaa 
225 230 235 240 

att aga tay cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata tty caa tgt age atg aca aaa ate tta gag cct ttt aga aag 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac eta gtt att tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Leu Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg ara tgg gga ttt acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Xaa Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg gca gtg caa cct ata gtg ctg cca gaa aaa gac age tgg 1053 
Lys Trp Ala Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp 
340 345 350 

<210> 43 
<211> 1082 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1082) 

<223> Portion of HIV Reverse Transcriptase 
<400> 43 

cct caa ate act ctt tgg caa cga ccc ctt gtc aca rta aag rta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr Xaa Lys Xaa Gly 
15 10 15 

ggg caa eta aag gaa get yta ttr gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Xaa Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat tta cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 
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gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg atg aca cag ctt ggt tgt act 2 88 

Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

a 99 9 aa ctt aat aa 9 aaa a ct caa gac ttc tgg gaa gtt caa tta ggg 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca gga tta aaa aag aat aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa att tta gat cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys lie Leu Asp Pro Phe Arg Lys 
260 265 270 

cag aat cca gat ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 
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gga tct gac tta gag ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga gca cat ctg ttg aag tgg gga ttt acc acc cca gac aaa aaa cat 960 
Arg Ala His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 365 

att tac gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 38 
<211> 1117 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1117) 

<223> Portion of HIV Reverse Transcriptase 



<400> 38 

cct caa tea ctt ctt tgg caa cga ccc mtc gtc aca ata aag gta ggg 

Pro Gin Ser Leu Leu Trp Gin Arg Pro Xaa Val Thr He Lys Val Gly 
15 10 15 



50 55 60 

gaa ate tgt gga cat aaa gtt gta agt aca gta tta gta gga cct aca 

Glu He Cys Gly His Lys Val Val Ser Thr Val Leu Val Gly Pro Thr 

65 70 75 80 



48 



ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca ata 96 

Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr He 

20 25 30 

tta gaa gac aya rat ttg cca ggg aga tgg aaa cca aaa ata ata ggg 144 

Leu Glu Asp Xaa Xaa Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 
35 40 45 

gga att gga ggt ttt ate aga gta aga cag tat gat cag gta ccc ata 192 

Gly He Gly Gly Phe He Arg Val Arg Gin Tyr Asp Gin Val Pro He 



240 
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cct gcc aac ata att gga aga aat ctg atg act cag att ggt tgc act 288 
Pro Ala Asn lie He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt gaa gaa ttg gaa aag gat ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Glu Glu Leu Glu Lys Asp Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gcc ata aag aaa aag aac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 * 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg .Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca gga tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta gat gaa gac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 * * 255 

tea ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ser He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtc ate tat caa tat atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gag ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 ~ 300 

aga cag cat ctg tgg aag tgg ggg ttt tac aca cca gac ara aaa cat 960 
Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Xaa Lys His 
305 310 315 320 
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cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gac 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tan tsc agg g 1117 
lie Xaa Xaa Arg 
370 

<210> 39 
<211> 1128 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1128) 

<223> Portion of HIV Reverse Transcriptase 
<400> 39 

cct cag ate act ctt tgg caa cga cca ttc gtc aca ata aaa ata ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Phe Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get ata tta gac aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala He Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt mtc aaa gta aga cag tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe Xaa Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gaa ate tgt gga cat aaa gtt atg agt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Val Met Ser Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg atg act cag mtt ggc tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin Xaa Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gwa cca gta aaa* tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Xaa Pro Val Lys Leu Lys 
100 105 110 
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cca ggg atg gac ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa gga 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt aat aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Asn Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gat ttc tgg gaa gtc caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aac aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat raa gat tea gra 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Xaa Asp Ser Xaa 
210 215 220 

agt aca ctg cat tta cca tac eta gta cgr acc aat gag aca cca ggg 720 
Ser Thr Leu His Leu Pro Tyr Leu Val Xaa Thr Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat caa tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac tta gtt ate tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Leu Val lie Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gat tta gaa ata gag cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg aag tgg ggg ttt tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 " 320 

cag aaa gaa cct cca ttc cgt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Arg Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta caa gee tat aaa get gee aga aaa aga cag ctg gac 1056 
Lys Trp Thr Val Gin Ala Tyr Lys Ala Ala Arg Lys Arg Gin Leu Asp 
340 345 350 



WO 01/35316 



PCT/US00/3O863 



-71- 



tgt caa tga cat tac mag aaa gtt agt ggg gaa aat tgg aat ttg ggg 1104 
Cys Gin * His Tyr Xaa Lys Val Ser Gly Glu Asn Trp Asn Leu Gly 
355 360 365 

caa ggt cag att tat tgc cag ggg 1128 
Gin Gly Gin lie Tyr Cys Gin Gly 
370 375 

<210> 40 
<211> 1120 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1120) 

<223> Portion of HIV Reverse Transcriptase 
<400> 40 

cct cag ate act ctt tgg caa cga ccc etc gtt gca ata aag ata ggg 4 8 

Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Ala He Lys He Gly 
1 5 10 15 

gga cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg agt ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga car tat gat cag ata ccm rta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Xaa Xaa 
50 * 55 60 

gaa att tgc gga cat aaa get gta ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag mtt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Xaa Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gtg aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 
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aga caa cat ctg ttg agg tgg ggg ttt acc acc cca gac aaa aaa cat 
Arq Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
= — - 315 320 



305 310 



cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lvs Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

qtc aat gac nat aca aaa gtt agt ggg gaa aat tga att ggg sea agt 
Val Asn Asp Xaa Thr Lys Val Ser Gly Glu Asn * lie Gly Xaa Ser 
355 360 365 

cag att tat tgg agg g 
Gin lie Tyr Trp Arg 
370 



480 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lvs lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt cct tta gat gaa gac ttc agr 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Xaa 
210 215 220 

aag tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 
Lvs Tvr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tec aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arq Tyr Gin Ser Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate eta gaa cct ttt agg aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gat ata gtt ate tat caa tac atg gat gat eta tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

qga tct gac tta gaa ata gaa cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 



528 



576 



624 



672 



720 



960 



1008 



1056 



1104 



1120 
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<210> 41 
<211> 1059 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1059) 

<223> Portion of HIV Reverse Transcriptase 
<400> 41 

cct caa ate act ctt tgg cag cga ccc gtt gtc aca ata aac ata ggg 4 8 

Pro Gin lie Thr Leu Trp Gin Arg Pro Val Val Thr lie Asn He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gac aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 



gaa ate tgt gga cat aaa act ata ggt aca gta tta ata gga cct aca 
Glu He Cys Gly His Lys Thr He Gly Thr Val Leu He Gly Pro Thr 
65 * 70 75 80 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



240 



cct gtc aac ata att gga aga aat ctg ttg act cag att ggc tgc act 288 

Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 

85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 

Leu Asn Phe Pro lie Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 

100 105 110 



384 



aaa ata aaa gca tta ata gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aac ccg tac aat act cca gtc ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gat agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 
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aga gaa ctt aac aag aaa act caa gac ttc tgg gaa att caa tta gga 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu He Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea gtt cct tta gat aaa gac ttc agg 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 



576 



624 



672 



aag tat act gca ttt acc ata cct agt aca aac aat gag acg cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 



720 



att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 ~ * 255 



768 



gec ata nnn nnn nnn nnn rum nnn nnn nnn nnn nnn nnn nnn nnn nnn 
Ala He Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
260 265 270 



816 



nnn nnn nnn nnn nnn nnn nnn tat caa tac atg gat gat ttg tat gta 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 ~ 285 



864 



gga tct gac tta gaa ata gag cag cat aga aca aaa ata gag aaa ctg 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Lys Leu 
290 295 300 



912 



aga caa cat ctg ttg agg tgg gga ttt acc aca cca gat aaa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 



960 



cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 

325 330 335 

aaa tgg aca gta cag cct ata gta ctg cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 

340 345 " 350 



gtc 
Val 



1059 



<210> 42 
<211> 1053 
<212> DNA 

<2 13 > Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 
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<221> CDS 

<222> (298) . . . (1053) 

<223> Portion of HIV Reverse Transcriptase 
<400> 42 

cct cag ate act ctt tgg caa cga ccc etc gtc aca at a arg at a ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Xaa He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt atm aaa gta aga cag tat gat cag ata eye ata 192 
Gly He Gly Gly Phe Xaa Lys Val Arg Gin Tyr Asp Gin He Xaa He 
50 55 60 

gaa ate tgt gga yat aaa get ata ggt acr gta tta gta gga ccc acg 240 
Glu He Cys Gly Xaa Lys Ala He Gly Xaa Val Leu Val Gly Pro Thr 
65 " 70 75 80 

cct gtc aac rta att gga aga aat ctg wtg act cag att ggt tgc act 288 
Pro Val Asn Xaa He Gly Arg Asn Leu Xaa Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ~ 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 "* 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa ttr gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Xaa Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtc caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aag aag aaa aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 
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gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa aty tgt ggg cat aaa get ata ggt aca gta tta gta ggg cct aca 240 
Glu Xaa Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ttg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc ccc aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aaa gaa ggg 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 ^ 150 155 160 



gec ata aag aaa aag gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



528 



aga gaa ctt aat aag aga act caa gac ttt tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata ccg cat ccc gca ggg tta aaa aag aaa aag tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aaa tat ast gca ttt ace ata ccg agt ata aac aat gag aca cca ggg 720 
Lys Tyr Xaa Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt ccg cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tgt age atg aca aaa ate tta gaa cct ttt aga aaa 816 
Ala lie Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 
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gga tct gac ttg gaa ata ggg cag cat aga aca aaa at a gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga cag cat ctg ttg aaa tgg ggr ttt acc aca cca gac aag aaa cat 960 
Arg Gin His Leu Leu Lys Trp Xaa Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 



cag aaa gaa cct cca ttc ctt tgg atg ggg tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



gga att gga ggt ttt gec aaa gta aga cag tat gat cag ata ccc ata 
Gly He Gly Gly Phe Ala Lys Val Arg Gin Tyr Asp Gin He Pro He 



cct gec aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 



1008 



aaa tgg aca gta caa ccg ata gag ctg cca gaa aaa gaa age tgg act 1056 
Lys Trp Thr Val Gin Pro He Glu Leu Pro Glu Lys Glu Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gg 1082 
Val Asn Asp He Gin Lys Leu Val 
355 360 

<210> 44 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 44 

cct cag ate act ctt tgg caa cga ccc ate gtc aca gta aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr Val Lys He Gly 
15 10 15 

ggg caa eta aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat tta cca gga aaa tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys He He Gly 
35 40 45 



192 



gaa ate tka gga cat aaa gtt ata ggt aca gtc tta gta gga cct aca 240 
Glu He Xaa Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 



288 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ~ 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aag att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa aac agy act wga tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Xaa Thr Xaa Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa ttr gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Xaa Gly 
180 185 190 

ata cca cat ccc tea ggg tta aaa aag aam aaa tea gta aca gta ctg 624 
He Pro His Pro Ser Gly Leu Lys Lys Xaa Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 " 215 220 

aaa tat act gca ttt acc ata cct agt rta aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Xaa Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aga ate eta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Arg He Leu Glu Pro Phe Arg Lys 
260 265 270 

cag aat cca gac ata gtt ate tat caa tac gtg gat gac ttg ctt gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Leu Val 
275 280 285 

gga tct gat tta gaa ata ggg caa cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg ggg ttt ate aca cca gac gaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe He Thr Pro Asp Glu Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 
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aaa tgg aca gta cag ccc ata gtg ctg cca gaa aaa gay age tgg act 1056 
Lys Trp Thr Val Gin Pro He . Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata caa aag tta gtg gga aaa ttg aat tgg gca age cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 45 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 45 

cct cag ate act ctt tgg caa cga ccc rtc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gac gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat tta cca gga aaa tgg aaa cca aaa atg ata gtg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met lie Val 
35 40 45 

gga att gga gga ttt gtc aaa gta aaa cag tat gag caa ata cct gta 192 
Gly He Gly Gly Phe Val Lys Val Lys Gin Tyr Glu Gin He Pro Val 
50 55 60 

gaa ate tgt gga cat aaa get gta ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gee aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca aaa gar 3 84 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Lys Glu 
115 120 125 
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aaa ata maa gca ttg gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys lie Xaa Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gtg ttt 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aag aac agt gat aga tgg aga aaa tta gta. gat ttc 
Ala lie Lys Lys Lys Asn Ser Asp Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



aaa tgg act ata cag cct atg gtg ctg cca gaa aaa gac age tgg act 
Lys Trp Thr lie Gin Pro Met Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



480 



528 



aga gaa ctt aat aag agg act caa gac ttc tgg gaa att caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu lie Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aag aaa tea gta aca rta eta 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Xaa Leu 
195 * 200 205 

gat gtg ggt gat gca tat ttt tea rtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Xaa Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat caa tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa get age atg aca aaa ate tta gag cct ttc aga aaa 816 
Ala He Phe Gin Ala Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa eta gtt ate tat caa tac gtg gat gac ttg tat gta 864 
Gin Asn Pro Glu Leu Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gga cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg tta aaa tgg gga tta ttc aca cca gac cag aaa cat 
Arg Glu His Leu Leu Lys Trp Gly Leu Phe Thr Pro Asp Gin Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



960 



1056 



gtc aat gac eta cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp Leu Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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att tat cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 46 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 46 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aaa gta ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys Val Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga agg tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata tec ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Ser He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gac ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 3 84 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gag att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aac cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 



WO 01/35316 



PCT/US00/30863 



-83- 



gcc ata aag aaa aaa gac agt act aag tgg aga aaa tta gta gat ttc 528 

Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 

165 170 175 

aga gaa ctt aat aaa aga act caa gac ttc tgg gag gtt caa tta gga 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta eta 624 

lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggc gat gca tat ttc tea gtt ccc tta gat gaa gac ttc aga 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aaa tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

act aga tat cag tac aat gtg etc cca cag gga tgg aaa gga tea cca 768 

Thr Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 

gca ata ttc caa tgt age atg aca aaa ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac eta gtt ate tat caa tac atg gat gat ttg tat gta 864 

Gin Asn Pro Asp Leu Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gga cag cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt acc acc cca gac aaa aaa cat 960 

Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 * 315 fe 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 

325 330 335 

aaa tgg aca gtr cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Xaa Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 



<210> 47 
<211> 1116 
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<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 47 

cct caa ate act ctt tgg caa cga ccc etc gtc aca gta aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg tgt ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Cys Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga caa tat gat cag gta gec atg 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Ala Met 
50 55 60 

gaa ate tgt gga cat aag get ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att age cct att gaa act gta ccm gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Xaa Val Lys Leu Lys 
100 105 110 

cca ggr atg gat ggt cca agg gtt aaa caa tgg cca ttg aca gaa gaa 3 84 

Pro Xaa Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata ara gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Xaa Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttt 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac tty tgg gaa gtt caa tta ggr 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Xaa 
180 185 190 
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ata ccg cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctt 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg gga gat gca tat ttt tea gtt ccc tta gat aaa gat ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa att tta gag cct ttt aga aag 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gga cag cat aga aca aaa ata gag gaa ctr 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Xaa 
290 295 300 

aga caa cat ctg ttg aag tgg ggg ytt acc aca cca gac aag aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Xaa Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccy cca ttc ctt tgg atg ggk tat gaa etc cat cct gat 1008 
Gin Lys Glu Xaa Pro Phe Leu Trp Met Xaa Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aar ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 48 

<211> 1115 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 
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<221> CDS 

<222> (298) . . . (1115) 

<223> Portion of HIV Reverse Transcriptase 
<400> 48 

cct cag ate act ctt tgg caa cga ccc etc gtc aca gta aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys lie Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 * 30 

ata gaa gac ata gaa ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
lie Glu Asp lie Glu Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gag cag gta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Glu Gin Val Pro He 
50 55 60 

gaa etc tgt ggg cgt aaa act ata ggt aca gta tta gta gga cct aca 240 
Glu Leu Cys Gly Arg Lys Thr He Gly Thr Val Leu Val Gly Pro Thr 
65 ^ 70 75 * 80 

cct gtc aac ata att gga aga aac ctg atg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta ata gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aac act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gey ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Xaa He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aag aaa tea gta aca gta ttg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccg tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 
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aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctk cca cag gga tgg aag gga tea cca 768 

lie Arg Tyr Gin Tyr Asn Val Xaa Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 



gca ata ttc caa agt age atg aca aaa ate ttg gag ccc ttt aga aaa 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



caa aat cca gac eta gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Leu Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

ggc tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aag tgg gga ttt acc aca cca gat aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tec car ga 1115 
He Ser Gin 
370 

<210> 49 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 49 

cct cag ate act ctt tgg caa cga ccc etc gtc rca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Xaa He Lys He Gly 
15 10 15 
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ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



96 



tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aag atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 



gga att gga ggt ttc ate aaa gta aga cag tat gat cag ata ccc ata 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 



192 



gaa ate tgt ggc cat aaa get ata ggt aca gta tta gta gga cct aca 240 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 

65 * 70 75 80 

cct gtc aac ata att gga aga aat eta ttg act cag att ggt tgc act 288 

Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 " ^ 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aag tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ~ 120 125 

aaa ata aaa gca tta gta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 

Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 150 155 160 

gee ata aag aaa aag gac agt act aaa tgg aga aaa tta gta gat ttc 528 

Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aam aaa tea gta aca gta ctg 624 

He Pro His Pro Ala Gly Leu Lys Lys Xaa Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 ~ ' 215 220 

aag tat ace gca ttt cca tec eta gtt ata aac aat gag aca cca gga .720 

Lys Tyr Thr Ala Phe Pro Ser Leu Val He Asn Asn Glu' Thr Pro Gly 

225 230 .235 240 

ate aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 
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gca ata ttt caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca age cag 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 " 360 365 

att tac cca ggg 
He Tyr Pro Gly 
370 



<210> 50 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



816 



caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa gta gag gag ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys Val Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gag cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



960 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



1104 



1116 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 50 

cct cag ate act ctt tgg caa cga ccc ttc gtc aac ata aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Phe Val Asn He Lys He Gly 
1 5 10 15 

gga caa ctg aag gaa get eta ttg gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ttg ata ggg 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Leu He Gly 
35 40 45 
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gga att gga ggt ttk gtc aaa gta aga cag tat gat cag ata cct gta 192 
Gly He Gly Gly Xaa Val Lys Val Arg Gin Tyr Asp Gin He Pro Val 
50 55 60 

gaa att tgt gga cat aaa gyt ata ggt aca gtc tta gta gga cct aca 240 
Glu He Cys Gly His Lys Xaa He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gcc aac ata att gga aga aat ctg ttg act cag att ggc tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc ccg aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aag att ggg cct gaa aat cca tac aat act cca ata ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 " 150 155 160 

gcc ata aag aaa aag aac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 " 185 190 

ata cca cat ccc gca ggg tta mam aag aac aaa tea gta aca gtg eta 624 
He Pro His Pro Ala Gly Leu Xaa Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta tat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Tyr Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tay aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc cag agt age atg aca aga ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Arg He Leu Glu Pro Phe Arg Lys 
260 265 270 



caa aat cca gaa ata gtc ate tat caa tac gtg gat gat ttg tat gta 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 



864 
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gca tct gac tta gaa ata gag aaa cat aga aca aaa ata gag gaa ctg 912 
Ala Ser Asp Leu Glu lie Glu Lys His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt tac aca cca gac aaa aag cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 * 365 

att tat gga ggg 1116 
He Tyr Gly Gly 
370 

<210> 51 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)...(297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 51 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 
35 40 45 

gga att gga ggt ttt aty aaa gta aga cag tat gat cag ata cct ata 192 
Gly He Gly Gly Phe Xaa Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa act ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Thr He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 
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cct gcc aac ata att gga aga gat ctg ttg act cag att ggt tgc act 288 

Pro Ala Asn He He Gly Arg Asp Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa ttg aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggt cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 

115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aag gat ggg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Asp Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cct gta ttt 480 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 * 150 155 160 

gcc ata aag aaa aaa aac agt act aaa tgg aga aaa tta gta gat ttc 528 

Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gcg ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 

He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 

195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta gat gaa gaa ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Glu Glu Phe Arg 
210 A 215 220 



aag tat act gca ttc acc ata cct agt ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



gca ata ttc caa agt age atg aca aaa ate tta gag ccc ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



720 



gtt aga tat caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



816 



caa aat cca gac ata gtt ate tat caa tat gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 " 280 .285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 " 300 

aga caa cat ctg tgg agg tgg ggg ttt tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Trp Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 
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cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta caa cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa tta gtg ggg aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
lie Tyr Ala Gly 
370 



<210> 52 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 52 

cct caa ate act ctt tgg caa cga ccc ctt gtc aca ata aag rta ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys Xaa Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atr ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Xaa lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ycc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Xaa He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt tea gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Ser Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata aty gga aga aat ctg atg act cag att ggt tgc act 288 
Pro Val Asn He Xaa Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa ack gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Xaa Val Pro Val Lys Leu Lys 
100 105 110 
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cca gga atg gat ggc cca aaa gtt aag caa tgg cca ttg aca gra gaa 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Xaa Glu 

115 120 " 125 

aaa ata aaa gca tta gta gaa att tgt aca gag atg gaa aag gaa ggg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



aaa att tea aga att ggg ccc gaa aat cca tac aat act cca ata ttt 
Lys He Ser Arg He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 150 155 160 



480 



gec ata aag aaa aag aat agt act aga tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 

a gg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aac aaa tea gtg aca gta ytg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Xaa 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 " 215 220 

aag tat act gca ttt acc ata cct agt atr aac aat gag aaa cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Xaa Asn Asn Glu Lys Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca car gga tgg aaa ggg tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tgt age atg aca aaa aty tta gag cct ttt aga aaa 816 
Ala He Phe Gin Cys Ser Met Thr Lys Xaa Leu Glu Pro Phe Arg Lys 
260 265 270 

car aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gaa cag cat aga aca aaa ata gag gaa ttg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga caa cat ctg tta agg tgg gga ttt ttc aca cca gaa caa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Phe Thr Pro Glu Gin Lys His 
305 310 315 320 

cag aaa gaa ccg cca ttc ctt tgg atg ggt tat gaa eta cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 " 330 335 

aaa tgg acg gta cag cct ata aag ctg cca gaa aaa gat age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Lys Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 
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gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Tip Ala Ser Gin 
355 360 365 

att tay gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 53 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 53 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aaa gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat tta cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 * ~ 45 

gga att gga ggt ttt ate aaa gtg aga cag tat gat cag rta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Xaa Pro He 
50 55 60 

gaa att tgt gga cat aaa get ata ggt aca gta tta gta gga tct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Ser Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggg tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gag atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 * 140 
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aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ate cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc egg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt aca aac aat gag aca cca gga 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt agg aat 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Asn 
260 265 270 

aaa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Lys Asn Pro Asp He Val lie Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac eta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 



aga gaa cat ctg ttg aag tgg ggg ttt act aca cca gac aaa aaa cat 

Arg Glu His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 

305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtc cag cct ata gag ctg cca gaa aaa gac age tgg act 

Lys Trp Thr Val Gin Pro He Glu Leu Pro Glu Lys Asp Ser Trp Thr 

340 345 350 



960 



1008 



1056 



gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca gga 1116 
He Tyr Ala Gly 
370 
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<210> 54 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . - (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 54 

cct cag ate act ctt tgg caa cga ccc aty gtc aca ata aag ata ggg 
Pro Gin lie Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
15 10 15 



48 



ggg caa eta aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg gat ttg cca gga aga tgg aaa cca aaa atg ata gtg 144 
Leu Glu Asp Met Asp Leu Pro Gly Arg Trp Lys Pro Lys Met He Val 
35 ~ 40 45 



gga att gga ggt ttt gtc aaa gta aga cag tat gat cag ata ccc ata 

Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa att ata ggt aca gta tta ata gga aat aca 

Glu He Cys Gly His Lys He He Gly Thr Val Leu He Gly Asn Thr 

65 ~ 70 75 80 

cct gee aac gta att gga aga aat ctg ttg act cag ctt ggt tgc act 
Pro Ala Asn Val He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 

85 90 95 



130 135 140 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aag gac agt act aaa tgg aga aaa gta gta gat ttc 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Val Val Asp Phe 



192 



240 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 3 84 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aag gat ggg 432 
Lys lie Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Asp Gly 



480 



528 
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aga gaa ctt aac aag aga act caa gac ttc tgg gag gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cac ccc gca ggg ata aaa aag aat aaa tea gta act gta eta 624 
lie Pro His Pro Ala Gly lie Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gta ggt gat gca tat ttc tea gtt ccc tta gat gaa gac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe. Arg 
210 ' 215 220 

aaa tat act gca ttc ace ata cct agt att aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg etc cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cac aga ata aaa ata rag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg He Lys He Xaa Glu Leu 
290 295 300 

aga gaa cat eta tgg aag tgg gga ttt tac aca cca gac aaa aag cat 960 
Arg Glu His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata acg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Thr Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg ggg aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 55 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 
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<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 55 

cct caa ate act ctt tgg caa cga ccc etc gtc gca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Ala lie Lys lie Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gtc 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aag cag tat gat cag gta ctt gta 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Asp Gin Val Leu Val 
50 * 55 60 



gaa att tgt gga cat ara get ata ggt aca gta tta gta gga cct aca 
Glu He Cys Gly His Xaa Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 " 70 75 80 



aga gaa ctt aat aag aaa acg caa gac ttc tgg gaa gtt caa tta gga 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



240 



cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgt act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca ggt atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 

aaa ata aaa gca tta ata gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt ace aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



576 



ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 
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gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aag gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt gta aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctg cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac atg gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Met Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gag cag cat aga rca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Xaa Lys He Glu Glu Leu 
290 295 300 

agg cag cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aag cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata ktg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Xaa Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 " 360 365 

att tarn ccc ngg 1116 
lie Xaa Pro Xaa 
370 

<210> 56 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
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<400> 56 

cct caa ate act ctt tgg caa cga ccc att gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ace ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Thr He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 24 0 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa' ata aaa gca tta gta gaa att tgt aca gaa atg gaa aaa gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gat agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gta caa tta gga 57 6 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta eta 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttc ace ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 
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<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 
<221> CDS 
<222> (0) . . . (297) 
<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1X16) 

<223> Portion of HIV Reverse Transcriptase 
<400> 62 

cct caa ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga caa tat gat cag ata gee ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Ala He 
50 55 60 



gaa ate tgt gga cat aaa get ata ggt aca gta tta ata gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 



aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



165 170 '** 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



48 



240 



cct gtc aac ata att gga aga aat ctg atg act cag att ggc tgc act 288 
Pro Val Asn lie He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gat act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Asp Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 



432 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag. aaa aag aat agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 



576 
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att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa age age atg aca aaa att tta gaa cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tat caa tac atg gat gat ttg tat gta 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta raa ata gag cag cat aga aca aaa ata gag gaa ctg 
Gly Ser Asp Leu Xaa He Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt ace aca cca gac aaa aag cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa cag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Gin Asp Ser Trp Thr 
340 345 350 



gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 



864 



912 



960 



1008 



1104 



att tat cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 57 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . - (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 57 

cct cag ate act ctt tgg caa cga ccc etc gtc aca gta aag tta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys Leu Gly 
15 10 15 

ggg caa eta atg gaa gtt eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Met Glu Val Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 
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rta gaa gaa ata agt tta cca gga aga tgg aaa cca aaa atg ata ggg 144 

Xaa Glu Glu lie Ser Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 

35 40 45 

gga att gga ggt ttt gtc aaa gta aaa cag tat gat cag gta ccc tta 192 

Gly He Gly Gly Phe Val Lys Val Lys Gin Tyr Asp Gin Val Pro Leu 

50 55 60 



gaa att tgt gga aaa aag get ata ggt aca gta tta gta gga cct aca 
Glu He Cys Gly Lys Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 * ^ 70 75 80 



aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 



gec ata aag aaa aag aac agt act aga tgg aga aaa tta gta gat ttt 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



gat gtg ggt gat gca tat ttt tea gtt ccc tta gat cca gac ttc agg 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Pro Asp Phe Arg 

210 215 220 

aag tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 

Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 

225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cca ttt aga aaa 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 

260 265 270 



240 



cct gec aac ata att gga aga aat ttt ttg get cag att ggt tgc act 288 

Pro Ala Asn He He Gly Arg Asn Phe Leu Ala Gin He Gly Cys Thr 

85 90 95 

tta aat ttc ccc att agt cct att gaa act gta cca gta aaa tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 

100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 

115 120 125 



432 



aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 



528 



aga gaa ctt aat aag agg acs caa gac ttc tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Arg Xaa Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 190 

ata cca cat ccc gca ggg tta aar aag aac aaa tea gta aca gta ctg 624 

He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 

195 200 205 



672 



720 



768 



816 
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caa aat cca gac ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gag cag cat aga aca aaa ata gag gaa ctg 
Gly Ser Asp Leu Glu lie Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt tac aca cca gac caa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Tyr Thr Pro Asp Gin Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata acg ctg cca gac aaa gac age tgg act 
Lys Trp Thr Val Gin Pro He Thr Leu Pro Asp Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 
He Tyr Ala Gly 
370 

<210> 58 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 58 

cct caa ate act ctt tgg caa cga ccc eta gtt aca ata aaa ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 



ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



gga att gga ggt ttt ate aaa gta aga car tat gat cag ata etc ata 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin lie Leu He 



912 



960 



1008 



1056 



1104 



1116 



96 



tta gaa gaa atg act ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Thr Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 



192 



50 55 60 
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gaa 
Glu 
65 


att 
lie 


tgt 
Cys 


gga cat aaa get 
Gly His Lys Ala 
70 


ata ggt aca 
He Gly Thr 


gta 
Val 
75 


tta 
Leu 


gta 
Val 


gga 
Gly 


cct 
Pro 


aca 
Thr 
80 


240 


cct 
Pro 


gtc 
Val 


aac 
Asn 


ata 
He 


att gga aga 
He Gly Arg 
85 


aat ctg 
Asn Leu 


ttg 
Leu 
90 


act 
Thr 


cag 
Gin 


ate 
He 


ggt 
Gly 


tgc 
Cys 
95 


act 
Thr 


288 


tta 
Leu 


aat 
Asn 


ttt 
Phe 


ccc 
Pro 
100 


att agt 
He Ser 


cct 
Pro 


att gag 
He Glu 
105 


act 
Thr 


gta 
val 


cca 
Pro 


gta 
Val 


aaa 
Lys 
110 


tta 
Leu 


aag 
Lys 


336 


cca gga 
Pro Gly 


ata 
Met 
115 


gat ggc cca aga 
Asp Gly Pro Arg 


gtt aar caa 
Val Lys Gin 
120 


tgg 
Trp 


cca 
Pro 


ttg 
Leu 
125 


aca 
Thr 


gaa 
Glu 


gaa 
Glu 


384 


aaa 
Ly9 


ata 
lie 
130 


aaa 
Lys 


gca 
Ala 


tta gta 
Leu Val 


gaa 
Glu 
135 


att tgt aca 
He Cys Thr 


aaa 
Glu 


ata 
Met 
140 


aaa 
Glu 


aaa 
Lys 


aaa 
Glu 


aaa 
Gly 


432 


aaa 
Lys 
145 


att 
He 


tea 
Ser 


aaa 
Lys 


att ggg 
He Gly 
150 


cct 
Pro 


gaa aat 
Glu Asn 


cca 
Pro 


tac 
Tyr 
155 


aat 
Asn 


act 
Thr 


cca 
Pro 


ata 
Val 


ttt 
Phe 
160 


480 


gcc 
Ala 


ata 
He 


aag 
Lys 


aaa 
Lys 


aaa gac 
Lys Asp 
165 


agt 
Ser 


act aaa 
Thr Lys 


tgg 
Trp 
170 


aaa 
Arg 


aaa 
Lys 


tta 
Leu 


ata 
Val 


aat 
Asp 
175 


ttc 
Phe 


528 


aga gaa 
Arg Glu 


ctt 
Leu 


aat aag aga act 
Asn Lys Arg Thr 
180 


caa gac 
Gin Asp 
185 


ttc 
Phe 


tqq 
Trp 


gaa 
Glu 


gtt 
Val 


caa 
Gin 
190 


tta 
Leu 


gga 
Gly 


576 


ata 
lie 


cca 
Pro 


cat 
His 
195 


cca gca ggg tta 
Pro Ala Gly Leu 


aaa aag 
Lys Lys 
200 


aaa 
Lys 


aaa 
Lys 


tea 
Ser 


gta 
Val 
205 


aca 
Thr 


gta 
Val 


ctg 
Leu 


624 


gat gtg 
Asp Val 
210 


ggt 
Gly 


gat gca tat 
Asp Ala Tyr 


ttt 
Phe 
215 


tea gtt 
Ser Val 


ccc 
Pro 


tta 
Leu 


gat 
Asp 
220 


aaa 
Lys 


gac 
Asp 


ttc 
Phe 


agg 
Arg 


672 


aag 
Lys 
225 


tat 
Tyr 


act 
Thr 


gca 
Ala 


ttt acc 
Phe Thr 
230 


ata 
He 


cct agt 
Pro Ser 


ata 
He 


aac 
Asn 
235 


aac 
Asn 


gag 
Glu 


aca 
Thr 


cca 
Pro 


ggg 
Gly 

240 


nod 


att aga 
lie Arg 


tat 
Tyr 


cag tac aat gtg 
Gin Tyr Asn Val 
245 


ctt cca 
Leu Pro 


cag 
Gin 
250 


gga 
Gly 


tgg 
Trp 


aaa 
Lys 


gga 
Gly 


tea 
Ser 
255 


cca 
Pro 


768 


gca 
Ala 


ata 
He 


ttc 
Phe 


caa 
Gin 
260 


agt age 
Ser Ser 


atg 
Met 


aca ata 
Thr lie 
265 


ate 
He 


tta 
Leu 


gag 
Glu 


cct 
Pro 


ttt 
Phe 
270 


aga 
Arg 


aaa 
Lys 


816 


caa aat 
Gin Asn 


cca 
Pro 
275 


gac 
Asp 


ata gtt 
He Val 


ate 
He 


tat caa tac 
Tyr Gin Tyr 
280 


atg 
Met 


gat 
Asp 


gat 
Asp 
285 


ttg 
Leu 


tat 
Tyr 


gta 
Val 


864 


gga tct 
Gly Ser 
290 


gac 
Asp 


tta 
Leu 


gaa ata ggg 
Glu He Gly 
295 


cag cat 
Gin His 


aga 
Arg 


aca 
Thr 


aaa 
Lys 
300 


ata 
He 


gag 
Glu 


gaa 
Glu 


ctg 
Leu 


912 
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aga cag cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cca gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata aag ctg cca gac aaa gac age tgg act 
Lys Trp Thr Val Gin Pro lie Lys Leu Pro Asp Lys Asp Ser Trp Thr 
340 345 350 



<400> 59 

cct caa ate act ctt tgg caa cga ccc tta gtc aca ata aag ata grg 

Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys He Xaa 

1 5 10 15 

ggg caa eta aaa gaa get eta tta gat aca gga gca gat gat aca gta 

Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 

20 25 30 



gga att gga ggt ttt att aaa gta aga cag tat gat caa ata gee ata 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Ala He 
50 " 55 60 

gaa att tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 * * 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 



960 



1008 



1056 



gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca gga 1116 
He Tyr Ala Gly 
370 

<210> 59 
<211> 1116 
<212> DNA 

<213> Human Iramunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



48 



96 



tta gaa gaa ata aat ttg cca ggg aaa tgg aaa cca maa atg ata ggg 144 
Leu Glu Glu He Asn Leu Pro Gly Lys Trp Lys Pro Xaa Met He Gly 
35 40 45 



192 



240 



288 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



gem ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 
Xaa He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtc caa tta gga 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



aga caa cat ctg ttg agr tgg ggg ttt tmc acg cca gac aaa aag cat 
Arg Gin His Leu Leu Xaa Trp Gly Phe Xaa Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



384 



aaa ata aaa gca tta rta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 

Lys lie Lys Ala Leu Xaa Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 

Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 



528 



576 



ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta eta 624 

lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 

195 ' 200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gac caa gac ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Gin Asp Phe Arg 
210 ~ 215 220 



720 



att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca agg ate tta gar cct ttt aga aaa 816 

Ala lie Phe Gin Ser Ser Met Thr Arg lie Leu Glu Pro Phe Arg Lys 

260 265 270 

caa aat cca gaa ata gtc aty tat cag tac atg gat gat tta tat gta 864 

Gin Asn Pro Glu lie Val Xaa Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa gta gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys Val Glu Glu Leu 
290 295 300 



960 



1008 
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aaa tgg aca gta cag act ata gaa ctg cca gaa aaa gat age tgg act 1056 
Lys Trp Thr Val Gin Thr lie Glu Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

ata tac cca ggg 1116 
lie Tyr Pro Gly 
370 



<210> 60 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 60 

cct caa ate act ctt tgg cag cga ccc cty gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
1 5 10 15 



ggg caa eta aaa gaa get eta tta gay aca gga gca gat gat aca gta 
Gly Gin Leu "Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata cct rta 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin lie Pro Xaa 
50 55 60 



cct gtc aac ata att gga aga aat ctg atg act cag ctt ggc tgc act 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin Leu Gly Cys Thr 
85 90 95 



96 



tta gaa gaa atg aat ttg cca ggr aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Xaa Arg Trp Lys Pro Lys Met He Gly 
35 40 45 



192 



gaa att tgt gga cat aaa get ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 * 70 75 80 



288 



tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gag 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 
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aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 



gec ata aag aaa aaa gac agt aat aga tgg aga aaa tta gtg gat ttc 
Ala He Lys Lys Lys Asp Ser Asn Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



ata cca cat cct gca ggg tta raa aag aac aaa tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Xaa Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 



aag tat act gca ttt acc ata cct agt acc aat aat gag aca ccm ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Xaa Gly 
225 230 235 240 



gca tat tty caa tgt agy atg aca aaa ate tta aag cct ttc agg aaa 
Ala Tyr Phe Gin Cys Xaa Met Thr Lys He Leu Lys Pro Phe Arg Lys 
260 265 270 



gca tct gac tta gaa ata gag cag cat aga aca aaa ata gag gaa ctg 
Ala Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 



528 



aga gaa ctt aat aar aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



624 



gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 " 215 220 



720 



gtt aga tat cag tat aat gta ctt ccc cag gga tgg aaa gga tea cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



816 



caa aat cca cac ata gtt att ttt caa tat gtg gat gac ttg tat gta 864 
Gin Asn Pro His He Val He Phe Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 



912 



aga caa cat ttg ttg agg tgg gga etc acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Leu Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

caa aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag ccc ata acg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Thr Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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att tat gca ggg 1116 
lie Tyr Ala Gly 
370 



<210> 61 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)...(297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 61 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aaa gat agg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys Asp Arg 
15 10 15 

ggg gca agt aaa gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Ala Ser Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa ata aat ttg cca ggg rag tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu lie Asn Leu Pro Gly Xaa Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tmt gat cag ata ccc gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Xaa Asp Gin He Pro Val 
50 55 60 

gaa att tgt gga cat aag get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag mtt ggt tgc act 288 
Pro Val Asn lie He Gly Arg Asn Leu Leu Thr Gin Xaa Gly Cys Thr 
85 90 95 

tta aat ttt ccc ate agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca tta aca gag gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca ata ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 150 155 160 
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gcc ata aag aaa aag gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt cag tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa age ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Ser Phe Arg 
210 215 220 



aag tac act gca ttt acc ata ccc agt aca aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 



720 



rca aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

Xaa Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa atg gtt ate tat caa tac atg gat gat ttg tat gta 864 

Gin Asn Pro Glu Met Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gag ata gag caa cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Glu Leu 

290 295 300 

aga caa cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aaa cat 960 

Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg 1116 
He Tyr Pro Gly 
370 



<210> 62 
<211> 1116 
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ata cca cat ccc gca ggg eta aaa aag aay aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtc ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tac act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

rtt aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
Xaa Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

tea ata ttc caa tgt age atg acg aaa ate tta gag cct ttt aga aaa 816 
Ser He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

cag aat cca gac ata gtt ate trt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Xaa Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gca tct gac tta gaa ata gag cag cat aga ata aaa ata gag gaa eta 912 
Ala Ser Asp Leu Glu He Glu Gin His Arg He Lys He Glu Glu Leu 
290 . 295 300 

aga caa cat ctg ttg aag tgg gga ttt tac aca cca gac aaa aaa yat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys Xaa 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gar etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag ttr gtg gga aaa ctg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Xaa Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 63 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)'. ..(297) 

<223> HIV Protease 
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<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 63 

cct caa ate act ctt tgg caa cga ccc gtt gtt aca gta agg ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Val Val Thr Val Arg He Gly 
15 10 15 

gga cag eta acg gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Thr Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg act ttg cca gga aaa tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Thr Leu Pro Gly Lys Trp Lys Pro Lys He He Gly 
35 40 45 

ggr att gga ggt ttt ate aaa gta aga cag tat gat cac gta ctt gta 192 
Xaa He Gly Gly Phe He Lys Val Arg Gin Tyr Asp His Val Leu Val 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 ~ 70 75 80 

cct gtc aac ata att gga aga aat ttg atg act cag ctt ggg ttc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin Leu Gly Phe Thr 
85 90 95 

tta aat ttt cca att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca ggg atg gat ggc cca aaa gtt aaa caa tgg cca ttg mca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Xaa Glu Glu 
115 120 125 

aaa ata aaa gca eta aca gaa att tgt aca gaa ttg gaa aag gaa gga 432 
Lys He Lys Ala Leu Thr Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aga ata ggg cct gaa aat cca tac aat act cca ata ttt 480 
Lys He Ser Arg He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 150 155 160 

gee ata aag aag aaa aac ggt ayt agg tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Gly Xaa Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gag eta aat aag aga act caa gac ttc tgg gaa gtt caa eta gga . 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 " 185 190 

ata cca cat cct gca gga eta aaa aag aac aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta cat gaa gac ttt aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu His Glu Asp Phe Arg 
210 215 220 
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aag tat acc gca ttc acc ata cct agt aca aac aat gaa aca cca gga 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa gga tea ccg 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg acc aaa ate tta gaa cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa atg gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Glu Met Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga ata aaa ata gag gaa tta 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg He Lys He Glu Glu Leu 
290 " 295 300 

agg gaa cac eta ttg aag tgg gga ttt ttc acc cca gac gaa aag cat 960 
Arg Glu His Leu Leu Lys Trp Gly Phe Phe Thr Pro Asp Glu Lys His 
305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa ctt cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtg cag cct ata aaa ctg cca gaa aaa gaa age tgg act 1056 
Lys Trp Thr Val Gin Pro He Lys Leu Pro Glu Lys Glu Ser Trp Thr 
340 345 350 

gtc aat gat ata cag aag tta gtg gga aaa tta aat tgg gca age cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca gga 1116 
lie Tyr Pro Gly 
370 



<210> 64 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)...(297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 64 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 48 

Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 
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ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 

Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat tta cca gga aaa tgg aaa cca aaa atr ata ggg 144 

Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Xaa He Gly 
35 40 45 

gga att gga ggy ttt rtc aaa gta aga cag tat gat cag ata syc ata 192 

Gly He Gly Xaa Phe Xaa Lys Val Arg Gin Tyr Asp Gin He Xaa He 
50 55 60 

gaa ate tgc gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 

65 70 75 80 



cct gyc aac ata att gga aga aat ctg ttg act cag ctt ggg tgc act 
Pro Xaa Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 



agg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta ggg 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



288 



tta aat ttt ccc att agt cct att, gaa act gta cca gta caa tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Gin Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca tta aca gaa gaa 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aag ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 

Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 150 155 160 

get ata aag aaa aag gac agt get aaa tgg aga aaa tta gta gat ttc 528 

Ala He Lys Lys Lys Asp Ser Ala Lys Trp Arg Lys Leu Val Asp Phe 

165 170 175 



576 



ata cck cat ccc gca ggg ttr aaa aag aaa aaa tea gta aca gta eta 624 
He Xaa His Pro Ala Gly Xaa Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gta ggt gat gca tat ttt tea gtt ccc tta gat caa aac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Gin Asn Phe Arg 
210 215 220 

aag tat act gca ttc acc ata cct agt ata aac aat gag ayg cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser lie Asn Asn Glu Xaa Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 
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gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gar ata rtt ate tat caa tac gtg gat gat ttg tat gta 
Gin Asn Pro Glu He Xaa He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac ttr gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 
Gly Ser Asp Xaa Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ytg ttg aag tgg gga ttt acc aca cca gac aag aag cat 
Arg Gin His Xaa Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggg tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



<400> 65 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



816 



864 



912 



960 



1008 



aaa tgg aca gta cag cct ata atg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Met Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca gga 1116 
He Tyr Ala Gly 
370 

<210> 65 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



48 



96 



tta gaa gac ate aat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp He Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 
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gga att gga ggt ttt gtc aaa gta aga gag tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe Val Lys Val Arg Glu Tyr Asp Gin Val Pro He 
50 55 60 

gac ate tgt gga cat aaa gtt ata ggt aca gtg tta gta gga cct aca 240 
Asp He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gar ate tgt aca gaa ttg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aay cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528- 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctr 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Xaa 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tac act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

rtt aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
Xaa Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa att tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata att ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He He He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 
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gga tct gat ttg gaa ata gag cag cat aga aca aaa ata gag gaa eta 912 
Gly Ser Asp Leu Glu lie Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg tgg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Glu His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aag gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata aag ytg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Lys Xaa Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt caa 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 * 365 

att tat cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 66 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 66 

cct cag ate act ctt tgg caa cga ccc ctt gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gak rca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Xaa Xaa Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 



35 40 45 

gga att gga ggt ttt ate aaa gta agr car tat gac cag ata ccc ata 

Gly He Gly Gly Phe He Lys Val Xaa Gin Tyr Asp Gin He Pro He 

50 5S 60 

gaa ate tgt gga cag aaa get ata ggt aca gta tta gta gga cct acm 

Glu He Cys Gly Gin Lys Ala He Gly Thr Val Leu Val Gly Pro Xaa 

65 70 75 80 



192 



240 
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cct gtc aac ata att gga aga aat ctg ttg act caa att ggt tgc act 288 

Pro Val Asn lie lie Gly Arg Asn Leu Leu Thr Gin lie Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 



cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



384 



aaa ata aaa gca tta gca gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Ala Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt aat ara tgg aga. aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Asn Xaa Trp Arg Lys Leu Val Asp Phe 
165 170 175 

agg gaa etc aat aag aga act caa gac ttc tgg gaa gtt caa tta ggc 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aam aaa tea gta aca rta ctr 624 
He Pro His Pro Ala Gly Leu Lys Lys Xaa Lys Ser Val Thr Xaa Xaa 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 " 215 220 

aar tat act gca ttt acc ata cct agt aca wac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Xaa Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag kre aat gtg yyt cca cag gga tgg aaa gga tern cca 768 
He Arg Tyr Gin Xaa Asn Val Xaa Pro Gin Gly Trp Lys Gly Xaa Pro 
245 250 255 

gca ata ttc mam agt age ayg aca aaa att tta gag cct ttt aga aaa 816 
Ala He Phe Xaa Ser Ser Xaa Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tgt caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Cys Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ttg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

agg caa cat ttg ttg agg tgg ggr ttt acc aca cca gac ara aaa cat 960 
Arg Gin His Leu Leu Arg Trp Xaa Phe Thr Thr Pro Asp Xaa Lys His 
305 310 315 320 
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355 360 365 



att tat gca ggg 
lie Tyr Ala Gly 
370 



<210> 67 

<211> 1119 

<212> DNA 

<213> Human Imraunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 67 

cct caa ate act ctt tgg caa cga cca ata gtc aca ata aag ata ggg 
Pro Gin lie Thr Leu Trp Gin Arg Pro lie Val Thr He Lys He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



gga att gga ggt ttt aty aaa gta aga cag tat gat cag ata tec ata 
Gly He Gly Gly Phe Xaa Lys Val Arg Gin Tyr Asp Gin He Ser He 
50 " 55 60 

gaa ate tgt ggg cat aaa gtt aca ggt aca gtg tta ata gga cct aca 
Glu He Cys Gly His Lys Val Thr Gly Thr Val Leu He Gly Pro Thr 
65 ^ 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 
Pro Val Asn lie He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 '90 95 



1008 



1056 



cag aaa gag cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 

325 330 335 

aaa tgg aca gta cag cct ata aaa ctg cca gaa aaa gay age tgg act 

Lys Trp Thr Val Gin Pro lie Lys Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 

Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 

^ </\ ice 



1116 



48 



96 



eta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 



192 



240 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 
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cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ' 120 125 

aaa ata aaa gca ttg gta gaa att tgt gca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Ala Glu Met Glu Lys Glu Gly 
130 135 140 

caa att tea aaa att gag cct gaa aat cca tac aat aat cca gta ttt 480 
Gin He Ser Lys He Glu Pro Glu Asn Pro Tyr Asn Asn Pro Val Phe 
145 ** 150 155 160 

gtc ata aag aaa aaa gac ggt act aac tgg aga aaa tta ata gat ytc 528 
Val He Lys Lys Lys Asp Gly Thr Asn Trp Arg Lys Leu He Asp Xaa 
165 ** 170 175 

aga gaa ctt aat aag aga act caa gat ttc tgg gaa att caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu He Gin Leu Gly 
180 ^ 185 190 

ata cca cat ccc gca ggg tta aaa aag aat aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 " 200 205 

gat gtg ggt gat gca ttt tat tea gtt ccc tta gat gag aac ttc agg 672 
Asp Val Gly Asp Ala Phe Tyr Ser Val Pro Leu Asp Glu Asn Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca atg gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Met Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa att tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

aac aat cca gac ata gtc ate tat caa tac atg gat gat ttg tat gta 864 
Asn Asn Pro Asp lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gca tct gat tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Ala Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 



aga gaa cat eta ttr aag tgg gga ttt acc aca cca gac aar aar yat 
Arg Glu His Leu Xaa Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys Xaa 
305 310 315 320 



960 



cag aaa gaa cct cca ytc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Xaa Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 
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gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg att 1119 
He Tyr Pro Gly He 
370 

<210> 68 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 68 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

gga caa eta aaa gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca ggg aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga ate gga gga ttt ate aaa gta aga cag tat gag cag ata cac ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Glu Gin He His He 
50 55 60 

gaa ate tgt ggg cat aaa get ata ggt aca gtr tta ata gga ccc aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Xaa Leu He Gly Pro Thr 
65 ^ " 70 75 80 



cct gtc aac ata att gga aga aat ctg ttg act cag att ggc tgc act 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gag 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " ~ 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 
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aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gtt ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aaa aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg ttg aag aag aaa aaa tea gta aca gta eta 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa aac ttt agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asn Phe Arg 
210 " 215 220 

aag tat act gca ttt ace ata cct agt ata aat aat gaa aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa get age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ala Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac atg rtt att tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Met Xaa He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

ggc tct gac tta gaa ata gga cag cat aga aca aaa ata gaa gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg ggg ttt ace aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc etc tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gcg agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg att 1119 
He Tyr Pro Gly He 
370 
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<210> 69 

<211> 1119 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 69 

cct cag ate act ctt tgg caa cga ccc cty gtc aca ata aag ata ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
15 10 15 

ggg caa yta aag gaa get mta tta gay aca gga gca gat gat aca gtg 
Gly Gin Xaa Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 



cca gga atg gat ggt cca aga gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



48 



96 



tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga gag tat gag cag ata caa gta 192 
Gly He Gly Gly Phe He Lys Val Arg Glu Tyr Glu Gin He Gin Val 
50 55 60 

gaa ate tgt gga cat aag get ata rgt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala He Xaa Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat eta atg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gag act gta ccg gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 



384 



aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu lie Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat acy ccr gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Xaa Xaa Val Phe 
145 150 155 160 



528 
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aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata ccg cat ccc gca ggg tta aag aag aaa aaa tea gta aca gta ctr 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Xaa 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 " ~ 215 220 

aag tac act gca ttt acc ata cct agt ata aac aat gag aca cca gga 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gaa cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat car tac atg gat gac ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa eta 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tkg agg tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Xaa Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cac cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctr cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Xaa Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gcg agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat tea ggg att 1119 
He Tyr Ser Gly He 
370 

<210> 70 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 
<220> 
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<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 70 

cct caa ate act ctt tgg caa cga ccc cty gtc kca ata aag gta ggr 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Xaa He Lys Val Xaa 
15 10 15 

ggg caa mta aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Xaa Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gat cag gta arc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Asp Gin Val Xaa He 
50 * 55 60 



gaa ate tgt gga cat aaa get ata ggt aca gta tta ata gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 70 7 5 80 



gey ata aag aaa aaa gac age act aaa tgg aga aaa tta gta gat ttc 
Xaa lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttt tgg gaa gtc caa tta gga 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccy gca ggg tta aaa aag aac aaa tea gta aca gta ttg 
He Pro His Xaa Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 



240 



cct gtc aac ata att gga aga aay ctg ttg aca cag att ggt tgy act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca ara gty aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Xaa Xaa Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aar gca tta atg gaa att tgt gca gay atg gaa aag gaa ggr 432 
Lys He Lys Ala Leu Met Glu He Cys Ala Asp Met Glu Lys Glu Xaa 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 ~ 150 155 160 



528 



576 



624 
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gat gtg ggt gat gca tat ttt tea gtt ccy tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Xaa Leu Asp Lys Asp Phe Arg 
210 215 220 

aaa tay act gca ttt acm ata cct agt ata aat aat gca aca cca ggg 720 
Lys Tyr Thr Ala Phe Xaa lie Pro Ser lie Asn Asn Ala Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga rar 816 
Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Xaa 
260 265 270 

cag aat cca gac ata gtt ate tat caa tac atg gat gay ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa mta ggg cag cat aga rca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu Xaa Gly Gin His Arg Xaa Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tta agg tgg ggg ttt ace acw cca gac aag aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Xaa Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta car ccc ata gtg ttg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 t 365 

att tay gsa ggg att 1119 
He Tyr Xaa Gly He 
370 

<210> 71 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
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<400> 71 

cct caa ate act ctt tgg caa cga ccc ate gtc tea ata aag ata ggg 

Pro Gin He Thr Leu Trp Gin Arg Pro He Val Ser He Lys He Gly 

15 10 15 

ggg gca aat aaa gaa get eta tta gat aca gga gca gat gat aca gta 

Gly Ala Asn Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 

20 25 30 



gga att gga ggt ttt age aaa gta aga caa tat gat cag ata ccc ata 

Gly lie Gly Gly Phe Ser Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgc gga cgt aaa gtt gta ggt tea gta tta ata gga cct aca 

Glu He Cys Gly Arg Lys Val Val Gly Ser Val Leu He Gly Pro Thr 

65 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag ctt ggc tgt act 

Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 



aaa att aca aaa att ggg cct gaa aat ccg tac aat act cca ata ttt 
Lys He Thr Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 " 150 155 160 



ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta gat aaa gac ttc agg 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aar tat act gca ttt ace ata cct agt acg aat aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 



48 



96 



tta gaa gaa atg aat ttg cca gga aga tgg aag cca aaa atg ata gtg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Val 
35 40 45 



192 



240 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca aaa gag 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Lys Glu 
115 " 120 125 

aaa ata aaa gca tta ata gaa att tgt aca gaa ttg gaa gaa gma gga 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Leu Glu Glu Xaa Gly 
130 135 140 



480 



gee ata aag aaa aar aac agt act aaa tgg aga aaa tta gta gac ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

agg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 



624 



672 



720 
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att aga tat cag tat aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat ccc gac ata gtt ate tat caa tac gtg gat gat ttg ctt gta 864 

Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Leu Val 
275 280 285 

gga tct gac tta gaa ata gag cag cat aga aca aaa ata gag gag eta 912 

Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Glu Leu 

290 295 300 

aga caa cat ctg tgg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 

Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata caa aag tta gtg gga aaa tta aat tgg gew agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Xaa Ser Gin 
355 360 365 

att tat cca ggg att 1119 
He Tyr Pro Gly He 
370 



<210> 72 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 



<400> 72 

cct cag ate act ctt tgg caa cga ccc cty gtc aca ata aag ate ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys He Gly 
1 5 10 15 

ggg caa tta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 
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ata gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
He Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt rtc aaa gta aga caa tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe Xaa Lys Val Arg Gin Tyr Asp Gin Val Pro He 



50 55 60 

gaa att tgc gga cat aaa get ata ggt aca gta tta ata gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 * 70 75 80 

cct gyc aac ata att gga aga aac ctg ttg act caa ctt ggc tgc act 
Pro Xaa Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



210 * 215 220 

aag tat act gca ttt acc ata cct age ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg etc cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gem ata ttc caa tgt age atg aca aaa ate tta gag cct ttt aga aaa 
Xaa He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



240 



288 



tta aat ttt cca att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 



384 



aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aaa gga agg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Gly Arg 
130 135 140 

aaa aat tac aaa att ggg cct gaa aac cca tac aat act cca gta ttt 480 

Lys Asn Tyr Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 * - 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



528 



aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gat aag gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 



720 



768 



816 
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caa aat cca gaa ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu lie Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

ggg tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 * 295 300 

aga cga cat ctg ttg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Arg His Leu Leu Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 ~ 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gag etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta caa cct ata gtg eta cca gag aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aag tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

ata tac gca ggg att 1119 
He Tyr Ala Gly He 
370 

<210> 73 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 



<400> 73 

cct caa ate act ctt tgg caa cga ccc ttc gtc aca gta aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Phe Val Thr Val Lys He Gly 
1 5 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat aat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asn Thr Val 
20 25 30 

tta gaa gaa atg aat tta ccg gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag rta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Xaa Pro He 
50 55 60 
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gaa ate tgt gga cac aaa get ata ggt aca gta tta ata gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga gat ctg ttg act cag ctt ggt tgc act 
Pro Val Asn He lie Gly Arg Asp Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gat act gta cca gta aaa tta aaa 
Leu Asn Phe Pro He Ser Pro He Asp Thr Val Pro Val Lys Leu Lys 
100 105 HO 



240 



288 



336 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg cct gaa aat cca tac aat acc cca gta ttt 480 
Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aag tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gcg ggg tta aaa aag aac aaa tea gta aca gta ctg 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 



624 



gat gtg ggt gat gca tat ttt tea. gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 ~ 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt ccc cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa att tta gag cct ttt aga aaa 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

cag aat cca gac ata gtt ate tac. caa tac gtg gat gac ttg tat gta 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata gat gag ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Asp Glu Leu 
290 295 300 



816 



864 
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agg caa cat ctg ttg aag tgg gga ttt tac aca cca gac aaa aag cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cca cca ttc ctt tgg atgggk tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Xaa Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aar gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg att 1119 
He Tyr Pro Gly He 
370 

<210> 74 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 74 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag gtc ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys Val Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gag gaa eta aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Leu Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ata tgt gga cat aaa get att ggt aca gta tta gta gga cct aca 24 0 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aac ttg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta aca gaa att tgt aca gaa atg gaa aag gaa ggg 
Lys He Lys Ala Leu Thr Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 ' 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 " 185 190 



ata cca cat ccc gca ggt tta aaa aag aaa aaa tea gta aca gtc ctg 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 



aag tac act gca ttt acc ata cct agt ata aac aat gag aca cca gga 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



432 



480 



528 



624 



gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 * 215 220 



720 



att aga tac cag tac aat gtg ctt ccc cag ggg tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt agg aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tac caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga cag cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aag cat 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gag etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



960 



1008 
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aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gat age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 "* 360 " 365 

att tat gca ggg 1116 
lie Tyr Ala Gly 
370 

<210> 75 
<211> 819 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (819) 

<223> Portion of HIV Reverse Transcriptase 
<400> 75 

ccc att agt cct att gam act gta cca gta aaa tta aag cca gga atg 48 
Pro lie Ser Pro lie Xaa Thr Val Pro Val Lys Leu Lys Pro Gly Met 
15 10 15 

gat ggc cca aaa gtt aaa caa tgg cca tta aca gag gaa aaa ata aaa 96 
Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu Lys lie Lys 
20 25 30 

gca ttg gta gaa att tgt aca gaa atg gaa aag gaa gga aaa att tea 144 
Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly Lys lie Ser 
35 40 45 

aaa att ggg cct gaa aat cca tac aat act cca gta ttt gec ata aag 192 
Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe Ala lie Lys 
50 55 60 

aaa aag gac agt act aaa tgg aga aaa tta gta gat ttc aga gaa ctt 240 
Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe Arg Glu Leu 
65 ~ 70 75 80 

aat aar aga act caa gat ttc tgg gaa gtt caa tta gga ata cca cat 288 
Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly lie Pro His 
85 90 95 

ccc tea ggg tta aaa aag aay aaa tea gta aca gta ttg gat gtg ggt 336 
Pro Ser Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu Asp Val Gly 
100 105 110 

gat gca tat ttt tea gtt ccy tta gat aaa gac ttc agg aag tat act 384 
Asp Ala Tyr Phe Ser Val Xaa Leu Asp Lys Asp Phe Arg Lys Tyr Thr 
115 120 125 



gca ttt acc ata cct agt ata aac aat gag aca cca ggg att agr tat 
Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly He Xaa Tyr 
130 135 140 



432 
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cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca gca ata ttc 480 
Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro Ala lie Phe 
145 150 155 160 

caa agt age atg aca aaa ate tta gag cct ttt aga aaa cat aat cca 528 
Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys His Asn Pro 
165 170 175 

gac ata gtt ate tat caa tac gtg gat gat ttg tat gta gga tct gac 576 
Asp lie Val lie Tyr Gin Tyr Val Asp Asp Leu Tyr Val Gly Ser Asp 
180 * 185 190 

tta gaa ata gag gag cat aga aca aaa ata gag gaa ctg agr vrg cat 624 
Leu Glu He Glu Glu His Arg Thr Lys He Glu Glu Leu Xaa Xaa His 
195 200 205 

ctg tta aag tgg gga ttt acy aca cca gac aaa aag cat cag aaa gaa 672 
Leu Leu Lys Trp Gly Phe Xaa Thr Pro Asp Lys Lys His Gin Lys Glu 
210 215 220 

cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat aaa tgg aca 720 
Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp Lys Trp Thr 
225 230 235 240 

gta cag cct ata aag ctg cca gaa aaa gac age tgg act gtc aat gac 768 
Val Gin Pro He Lys Leu Pro Glu Lys Asp Ser Trp Thr Val Asn Asp 
245 250 255 

ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag att tat gca 816 
He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin He Tyr Ala 
260 265 270 

ggg 819 
Gly 



<210> 76 
<211> 819 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (819) 

<223> Portion of HIV Reverse Transcriptase 
<400> 76 

ccc att agt cct att gaa act gta cca gta aaa tta aag cca gga atg 48 
Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys Pro Gly Met 
15 10 15 

gat ggc cca aaa gty aaa caa tgg cca tta aca gaa gaa aaa ata aga 96 
Asp Gly Pro Lys Xaa Lys Gin Trp Pro Leu Thr Glu Glu Lys He Arg 
20 25 30 

gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga aaa att tea 144 
Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly Lys He Ser 
35 40 45 
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aga gaa cat ctg tta arg tgg gga ttt acc aca cca gac aaa aag cat 960 
Arg Glu His Leu Leu Xaa Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata cag ctg cca gaa aag gaa age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Gin Leu Pro Glu Lys Glu Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 82 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 82 

cct cag ate act ctt tgg caa cga ccc etc gtc aca gta aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys He Gly 
15 10 15 

ggg caa tta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg agt tta cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ctt gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu Val 
50 55 60 



gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 * 70 75 80 



240 



ccc gtc aac ata att gga aga aat ctg ttg act cag att ggg tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 
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aaa att ggg cct gaa aat cca tac aat act cca gtg ttt get ata aag 192 
Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe Ala lie Lys 
50 55 60 

aaa aaa gac agt act aar tgg aga aaa ttg gta gat ttc aga gaa ctt 240 
Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe Arg Glu Leu 
65 70 75 80 

aat aag aga act caa gac ttc tgg gaa gtt caa tta gga ata cca cat 288 
Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly He Pro His 
85 90 95 

ccc tea ggg tta aaa aag aaa aaa tea gta aca gta ctg gat gtg ggt 336 
Pro Ser Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu Asp Val Gly 
100 105 110 

gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg aag tat act 
Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg Lys Tyr Thr 
115 120 125 



cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca gca ata ttc 
Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro Ala He Phe 
145 150 155 160 

caa agt age atg aca aaa ate tta gag cct ttt aga aaa caa aat cca 
Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys Gin Asn Pro 
165 170 175 

gac ata gtt ate tat caa tac gtg gat gat ttg tat gta gga tct gac 
Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val Gly Ser Asp 
180 185 190 

eta gaa ata gga cag cat aga aca aaa ata gag gaa ctg aga cag cat 
Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu Arg Gin His 
195 200 205 

ctg ttg agg tgg gga ttt ace aca cca gac aag aaa cat cag aaa gaa 
Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His Gin Lys Glu 
210 215 220 



384 



gca ttt act atn cct agt ata aac aat gag aca cca ggg att agg tat 432 
Ala Phe Thr Xaa Pro Ser He Asn Asn Glu Thr Pro Gly He Arg Tyr 
130 135 140 



480 



52B 



576 



624 



672 



cct ccc ttt ctt tgg atg ggc tat gaa etc cat cct gat aaa tgg aca 720 
Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp Lys Trp Thr 
225 230 235 240 

gta cag cct ata gag ctg cca gac aag gat age tgg act gtc aat gac 768 
Val Gin Pro lie Glu Leu Pro Asp Lys Asp Ser Trp Thr Val Asn Asp 
245 250 255 

ata cag aag tta gtg gga aaa tta aat tgg gca agt cag ata tat gca 816 
He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin He Tyr Ala 
260 265 270 



ggg 

Gly 



819 
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<210> 77 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 77 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg aat ttg cca ggg aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata cct ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin lie Pro He 
50 55 60 

gaa ate tgc gga cat aaa get gta ggt aaa gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala Val Gly Lys Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act caa ctt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro lie Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aag caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gag aag gaa ggg 432 
Lys lie Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aaa aac agt act aga tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 
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aga gaa ctt aat aag aga acg caa gac ttc tgg gaa gtt caa nnn rum 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Tip Glu Val Gin Xaa Xaa 
180 185 190 

nnn nnn nnn nnn nnn ggg twa aaa aag aaa aaa tea gta aca gta ctg 624 
Xaa Xaa Xaa Xaa Xaa Gly Xaa Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gta ggt gat gca tat ttc tea gtt cct eta gat aaa gac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac act gca ttc ace ata cct agt ata aac aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctg cca cag gga tgg aaa gga tea cca 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gtg 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ttg ttg aag tgg gga ttt ace aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca age cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 " 360 365 

ata tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 78 
<211> 1122 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



720 



768 



816 



864 



912 



960 



1008 



1056 



<220> 
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<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1122) 

<223> Portion of HIV Reverse Transcriptase 
<400> 78 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg gat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Asp Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct aca 240 
Glu lie Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn lie He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 * 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro lie Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 

aaa ata aaa gca ttg gta gaa ata tgt aca gaa atg gaa aag gaa ggg 432 
Lys lie Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat acr cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Xaa Pro Val Phe 
145 150 155 160 

gee ata arg aaa aaa gaa age tct age tct aaa tgg aga aaa tta gta 528 
Ala lie Xaa Lys Lys Glu Ser Ser Ser Ser Lys Trp Arg Lys Leu Val 
165 170 175 

gat ttc aga gaa ctt aat aar aga act caa gac ttt ttk gaa gtt caa 576 
Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Xaa Glu Val Gin 
180 185 190 

tta gga ata cca cat ccc gca ggg tta aag aag aaa aaa tea gya aca 624 
Leu Gly lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Xaa Thr 
195 200 205 
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rta ttg gat gtg ggt gat gca tat ttt tea gtt ccc tta gat raa gac 672 
Xaa Leu Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Xaa Asp 
210 215 220 

ttc agg aag tat act gca ttt acc ata cct agt ata aac aat gag aca 720 
Phe Arg Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr 
225 ~ 230 235 240 



cca ggg att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga 
Pro Gly He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly 
245 250 255 



768 



tea cca get ata ttc caa agt age atg aca aaa ate tta gag cct ttt 816 
Ser Pro Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe 
260 265 270 

aga aaa caa aat cca gay ata gtt ate tat caa tac atg gat gat ttg 864 
Arg Lys Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu 
275 280 285 

tat gta gga tct gay tta gaa ata gag cag cat aga ata aaa ata gag 912 
Tyr Val Gly Ser Asp Leu Glu He Glu Gin His Arg He Lys He Glu 
290 295 300 

gaa ctg aga caa yat ytg tgg arg tgg ggr ttt tac aca cca gac aaa 960 
Glu Leu Arg Gin Xaa Xaa Trp Xaa Trp Xaa Phe Tyr Thr Pro Asp Lys 
305 ~ 310 315 320 

aaa cat cag aaa gaa cct cca ttc cat tgg atg ggt tat gaa etc cat 1008 
Lys His Gin Lys Glu Pro Pro Phe His Trp Met Gly Tyr Glu Leu His 
325 330 335 

cct gat aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age 1056 
Pro Asp Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser 
340 345 350 

tgg act gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca 1104 
Trp Thr Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala 
355 360 365 

agt cag att tat gca ggr 1122 
Ser Gin He Tyr Ala Xaa 
370 



<210> 79 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
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<400> 79 

cct cag ate act ctt tgg caa cga ccc etc gtt aca ata aag gta ggg 48 

Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys Val Gly 

1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gac aat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asn Thr Val 
20 25 30 

ttc gaa gac ctg gat tta cca gga agg tgg aaa cca aaa atg ata ggg 144 
Phe Glu Asp Leu Asp Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gag cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Glu Gin He Pro He 
50 55 60 

gaa ate tgt ggg cgt aaa get ata ggt aca gtg tta gta gga cct aca 24 0 

Glu He Cys Gly Arg Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga gat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asp Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

eta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta ata gaa att tgt gca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu He Glu He Cys Ala Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gee ata aag aaa aag aac agt aat aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Asn Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aag tea ata aca gta tta 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser He Thr Val Leu 
195 .200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 
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att aga tat cag tac aat gtg ctg cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa att tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gat tta gaa ata gag cag cat aga aca aaa ata gat gaa ctg 912 
Gly Ser Asp Leu Glu lie Glu Gin His Arg Thr Lys lie Asp Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ctt acc aca cca gac cag aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Leu Thr Thr Pro Asp Gin Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gac aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Asp Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg ggr aaa ttg aat tgg gca agt caa 1104 
Val Asn Asp lie Gin Lys Leu Val Xaa Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 80 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 80 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
15 10 15 

ggg cag eta aag gag get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 
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tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 

Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 . 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata etc ata 192 

Gly lie Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu He 
50 55 60 



gaa att tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 



240 



cct gtc aac ata att gga aga aat ctg ttg acw cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Xaa Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aar gaa ggg 4 32 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca rta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Xaa Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag agg act caa gat ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg ttg aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gat gaa gac ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 
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caa aat cca gac ata gtc ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat agg aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ttg ttg aag tgg ggg ttt ace aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtg cag cct ata gtg tta ccg gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg att 1119 
He Tyr Pro Gly He 
370 

<210> 81 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 81 

cct caa ate act ctt tgg caa cga ccy ctt gtt rcc ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Xaa Leu Val Xaa lie Lys He Gly 
15 10 15 

ggg caa eta arg gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Xaa Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa ata aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu He Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aaa cag tat gat caa ata ccy rta 192 
Gly He Gly Gly Phe He Lys Val Lys Gin Tyr Asp Gin He Xaa Xaa 
50 55 60 
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gaa att tgt gga cat aga get ata ggt aca gtw tta gta gga cct aca 240 
Glu lie Cys Gly His Arg Ala He Gly Thr Xaa Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga agr aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Xaa Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca ttg gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aga att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Arg lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 " 150 155 160 

get ata aag aaa aar gat agt act aaa tgg aga aaa tta gta gat ttc 52 8 

Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

agg gaa ctt aat aag agg act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt act ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat caa tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cca ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtc ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gga cag cat aga aca aaa ata gag gaa yta 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Xaa 
290 295 300 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aaa aag aaa gac agt act aaa tgg aga aag tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aay aaa aag act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gam ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Xaa Phe Arg 
210 215 220 

aar tat act gca ttt acc ata cct agt gta aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tay cag tac atg gat gat ttg tat gta 
Gin Asn Pro Asp lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggr aag cac aga aca aaa ata gag gag eta 912 
Gly Ser Asp Leu Glu He Xaa Lys His Arg Thr Lys He Glu Glu Leu 
290 295 300 



aga cag cat ctg ttg aag tgg gga ttt acc aca cca gac aaa aaa cat 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctk tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Xaa Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



768 



816 



864 



960 



1008 
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aaa tgg aca gta cag cct ata aaa ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Lys Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gty aat gac ata cag aag tta gtg gga aaa ttr aat tgg gec agt cag 1104 
Xaa Asn Asp He Gin Lys Leu Val Gly Lys Xaa Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 83 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 83 

cct cag ate act ctt tgg caa cga cca etc gtc gca ata aag ata ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Ala He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac atg agt tta cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp Met Ser Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat caa gta ccc ata 192 
Gly He Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 * 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 3 36 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 
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aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tat aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg agg aaa tta gta gat ttt 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 ~ 175 

aga gaa ctt aat aag aga act caa gat ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cca gca ggg tta aaa aag aaa aag tea gta aca gtg ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt acc ata ccc agt ata aac aat gag aca ccc agg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Arg 
225 230 235 240 

gtt aga tat caa tac aat gta ctt cca cag gga tgg aaa gga tea cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca tat ttc caa agt age atg aca aaa ate tta gaa ccc ttc aga aaa 816 
Ala Tyr Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aac cca gac ata gtt ate tat caa tac atg gat gac tta tat gta 864 
Gin Asn Pro Asp lie Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gag ata gga cag cat aga gca aaa ata gag gac eta 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Asp Leu 
290 295 300 

aga gca cat ctg ttg aag tgg ggg ttt acc aca cca gac aaa aaa cat 960 
Arg Ala His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 * 320 

cag aaa gaa ccc cca ttt etc tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
32S 330 335 

aaa tgg aca gta cag cct ata gwg eta cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Xaa Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa tta gta gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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att tat cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 84 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 84 

cct caa ate act ctt tgg caa cga ccc att gtc aca ata aaa gta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys Val Gly 
15 10 15 

ggg caa eta atg gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Met Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata aat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 
Leu Glu Asp He Asn Leu Pro Gly Arg Trp Lys Pro Lys He He Gly 
35 40 45 

gga att ggt ggt ttt gtc aaa gtg aga cag tat gat cag gta ccc ata 192 
Gly He Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin Val Pro He 
50 55 60 

gaa ate tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct aca 240 

Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 

65 70 75 80 

» 

cct ace aac gta gtt gga aga aat ctg atg act cag att ggc tgc acy 288 

Pro Thr Asn Val Val Gly Arg Asn Leu Met Thr Gin He Gly Cys Xaa 

85 90 95 

tta aat ttt cct att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg acg gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aag gat gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Asp Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tat aat act cca ata ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 ISO 155 160 
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gcc ata aag aaa aag aac agt gat aaa tgg aga aaa tta gta gat ttc 52 8 

Ala lie Lys Lys Lys Asn Ser Asp Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aar aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aat aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat ata ggt gat gca tat ttt tea att ccc tta gat aaa gac ttt agg 672 
Asp lie Gly Asp Ala Tyr Phe Ser lie Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttc acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

gtt aga tat cag tac aat gtg ctt cca cag gga tgg aag gga tea cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa age age atg acc aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

cag aat cca gac ata gtt ate tgc caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val lie Cys Gin Tyr Val Asp Asp Leu Tyr Val 
275 * 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctr 912 
Gly Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys lie Glu Glu Xaa 
290 295 300 

agg aat yat ctg tgg aag tgg gga ttt tac aca cca gac aaa aaa tat 960 
Arg Asn Xaa Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys Tyr 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag ccc ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg 1116 
lie Tyr Pro Gly 
370 
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<211> 1116 
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<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 85 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aaa gta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys Val Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca ggg aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta age ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Ser He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta ata gga ccc acc 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr . Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gag aag gaa ggr 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Xaa 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 ' 150 155 160 

gec ata aar aaa aaa gac agt act aaa tgg aga aag tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 " 170 175 

aga gaa ctt aat aaa ara act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Xaa Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 " 185 190 
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ata cca cat ccc gca ggg tta aaa aag aam aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Xaa Lys Ser Val Thr Val Leu 
195 200 205 

gay gtg ggt gat gcr tat ttt tea gtt ccy tta gay aaa gay ttc agg 672 
Asp Val Gly Asp Xaa Tyr Phe Ser Val Xaa Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac aca gca ttt acc ata cct agt gta aac aat gag rca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Val Asn Asn Glu Xaa Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca car gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aar 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

maa aat cca gac ata gty ate tay caa tac atg gat gat ttr tat gta 864 
Xaa Asn Pro Asp He Xaa He Tyr Gin Tyr Met Asp Asp Xaa Tyr Val 
275 280 285 

gga tct gac tta gaa ata gga cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg cag tgg ggg tta acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Gin Trp Gly Leu Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggg tat gaa etc cat ccg gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata wtg ctg cca gac aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Xaa Leu Pro Asp Lys Asp Ser Trp Thr 
340 345 350 

gtm aat gac ata cag aar tta gta gga aaa ttg aat tgg gcg agt cag 1104 
Xaa Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

ate tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 86 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 
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<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 86 

cct caa ate act ctt tgg caa cga ccc ate gtc aca gta aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro He Val Thr Val Lys He Gly 
15 10 15 

ggg cac aca acg gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly His Thr Thr Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca ggg aga tgg aaa cca aaa atg ata gga 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gag cag gta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Glu Gin Val Pro He 
50 55 60 

gaa ttc tgt gga cat aaa act gta ggt aca gta tta ata gga cct aca 240 
Glu Phe Cys Gly His Lys Thr Val Gly Thr Val Leu He Gly Pro Thr 
65 ' 70 75 80 

cct gtc aac ata att gga aga aat ctg atg act cag att ggt tgt act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggg ccc aaa gtt aaa cca tgg cca ttg aca gaa aga 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Pro Trp Pro Leu Thr Glu Arg 
115 120 125 

aaa aat aaa gca tta gta gaa att tgt tec gaa atg gaa aaa gga agg 432 
Lys Asn Lys Ala Leu Val Glu He Cys Ser Glu Met Glu Lys Gly Arg 
130 135 140 



aaa att tea aaa att ggg cct gag aat cca tac aat act cca gta ttt 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 



aga gaa ctt aat aaa aga act caa gac ttc tgg gaa gtt cag tta gga 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aac aaa tea gta aca gta ctg 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 * 200 205 



480 



gee ata aag aaa aag aac agt act aga tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



576 



624 



gat gta ggt gat gca tat ttt tea gtt ccc tta gat gaa gaa ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Glu Phe Arg 
210 215 220 
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aag tat act gca ttc acc ata cct agt aca aac aat gaa aca cca ggg 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 



720 



att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tgt age atg aca aaa ate tta gag ccc ttt aga aaa 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tgt cag tac atg gat gac ttg tat gta 864 
Gin Asn Pro Glu He Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gca tct gat tta gaa ata ggg cag cat aga aca aaa gta gag gaa ctg 912 
Ala Ser Asp Leu Glu He Gly Gin His Arg Thr Lys Val Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aag tgg ggg ttt ttc aca cca gac gaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Phe Thr Pro Asp Glu Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gta ctg cca gac caa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Asp Gin Asp Ser Trp Thr 
34 0 345 350 

gtc aat gat ata cag aag tta gtg gga aaa tta aat tgg gca agt caa 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 87 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



<400> 87 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata gag 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Glu 
15 10 15 
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ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg tea gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Ser Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro lie 
50 55 60 

gag ate tgt gga cat aaa get gta ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga agr aat ctg ttg act cag att ggt tgc acc 288 
Pro Val Asn He He Gly Xaa Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca ata ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 
145 * 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat tty 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccy gca ggg ttg aar aag aaa aaa tea gta aca gta ctg 624 
He Pro His Xaa Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gat gaa gay ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 * 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

gtt aga tat car tac aat gtg ctt cca cag gga tgg aag gga tea cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 
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gca ata ttc caa age age atg aca aaa ate tta gag cct ttt agg aaa 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gat ata gtt ate tat caa tac atg gat gac ttr tat gta 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Xaa Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg car cat aga aca aaa ata gag gaa ttg 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 



cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



<400> 88 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 

Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
1 5 10 15 



816 



864 



912 



aga caa cat ctg ttg aag tgg gga tta acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Leu Thr Thr Pro Asp Lys Lys His 
305 310 315 320 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gat ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 88 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 



48 



ggg caa eta agg raa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Arg Xaa Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata gaa ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp He Glu Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 
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gga att gga ggt ttt gtc aaa gta aga caa tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin lie Pro lie 
50 55 60 

gaa ate tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct aca 24 0 

Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gec aac ata att gga aga aat ctg atg act cag ctt ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Met Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca aaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Lys Glu 
115 120 125 

aaa ata gaa gca tta atr gaa att tgt gma ttt ttg gaa aag gaa gga 432 
Lys He Glu Ala Leu Xaa Glu He Cys Xaa Phe Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat ccg tac aac act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gga ggt act aaa tgg aga aaa ata gta gat ttc 528 
Ala He Lys Lys Lys Gly Gly Thr Lys Trp Arg Lys He Val Asp Phe 
165 170 175 

aga gaa ctt aat aaa aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr -Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gcg ggg tta aaa aag aay aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta gat gaa gaa etc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Glu Glu Leu Arg 
210 215 220 

aag tat act gca ttt act ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tac caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa agt age atg aca aaa ate tta gag ccc ttt aga aag 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate twt caw tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp Tie Val He Xaa Xaa Tyr Val Asp Asp Leu Tyr Val 
275 280 285 
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gga tct gac tta gaa ata ggg aag cat agg gaa aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Gly Lys His Arg Glu Lys lie Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg aag tgg gga ttt tac aca cca gac gaa aaa cat 960 
Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Glu Lys His 
305 * 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat ctt gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Leu Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 89 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 89 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
1 5 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg agt ttg cca ggg aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga caa ttt gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Phe Asp Gin He Pro He 
50 55 60 

gaa ata tgt gga cac aaa get ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu lie Gly Pro Thr 
65 70 75 80 
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cct gtc aac ata att gga agg aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn lie lie Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc ate agt cct att gaa cct gta cca gta aaa tta aag 336 
Leu Asn Phe Pro lie Ser Pro lie Glu Pro Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 3 84 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aaa gaa ggg 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca ata ttt 4 80 

Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro lie Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe > 
165 170 175 

aga gaa ctg aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta acg gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aaa tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt caa cat age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin His Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

cag aat cca gac ata gtt ate tat caa tac gtg gat gac ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg ttg aag tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Glu His Leu Leu Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 
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cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 * 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gat ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 90 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 90 

cct cag ate act ctt tgg caa cga ccc aty gtc aca ata aaa gta ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys Val Gly 
15 10 15 

gga cag eta aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 ~ 30 

tta gaa gaa atg aac ttg cca gga aaa tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys He He Gly 
35 40 * 45 

gga att gga ggt ttt gtc aga gta aga caa tat gat cag gta cct gta 192 
Gly He Gly Gly Phe Val Arg Val Arg Gin Tyr Asp Gin Val Pro Val 
50 55 60 

gaa att tgt gga cat aaa get ata ggt tea gta tta gta gga cca aca 240 
Glu He Cys Gly His Lys Ala He Gly Ser Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gec aac ata att gga aga aat ctg atg act cag ctt ggt ttc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Met Thr Gin Leu Gly Phe Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 
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cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gar att tgt aca gaa ytg gaa aaa gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Xaa Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aag aac agt gat aga tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Asp Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gga ggg tta aaa aag aaa aaa tea gta aca gta eta 624 
He Pro His Pro Gly Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat car tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata tty caa agt age atg aca aaa ate tta gag cct ttt agg aag 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

maa aat cca gac ata gtt ate att caa tac atg gat gat ttg tat gtr 864 
Xaa Asn Pro Asp He Val lie lie Gin Tyr Met Asp Asp Leu Tyr Xaa 
275 280 285 

gga tct gat tta gaa ata gar cag cay aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga gat cat tta ttg agg tgg ggg ttt ttc aca cca gaa caa aaa cat 960 
Arg Asp His Leu Leu Arg Trp Gly Phe Phe Thr Pro Glu Gin Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc cat tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe His Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cat cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val His Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 
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gtc aat gac ata cag aag tta gtg gga aaa ttr aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Xaa Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
lie Tyr Ala Gly 
370 

<210> 91 
<211> 1115 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1115) 

<223> Portion of HIV Reverse Transcriptase 
<400> 91 

cct cag ate act ctt tgg caa cga ccc ctt gtc aca gta aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr Val Lys lie Gly 
15 10 15 

ggg caa eta ata gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu lie Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

ttg gaa gaa atg aat ttg cca ggg aga tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys lie lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin lie Pro lie 
50 55 60 

gaa ate tgt gga cat aaa gtt ata rgt cca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Val He Xaa Pro Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ttg atg act cag att ggc tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc ate agt cct att raa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Xaa Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aag gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 " 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 
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aaa ate tea aaa att ggg cct gaa aac cca tac aat act cca gta ttt 480 
Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 



gee ata aag aaa aaa aac agt act aga tgg aga aaa tta gta gat ttc 
Ala lie Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Leu Val Asp Phe 
165 170 175 



aag tat act gca ttt acc ata cct agt ata aat aat gag aca cca ggg 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



528 



aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gga ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Gly Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt cct eta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 



720 



gtt aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga teg cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttt cag get age atg aca aaa ate tta gag ccg ttt aga aaa 816 
Ala He Phe Gin Ala Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac eta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ttg ttg aaa tgg gga ttt ate aca cca gat gaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe He Thr Pro Asp Glu Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggg tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aag tgg aca gta cag cct ata gta ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 " 360 365 

att tat gca gg il 15 
lie Tyr Ala 
370 
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<210> 92 

<211> 1116 

<212> DNA 

<213> Human Imtnunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 92 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
1 5 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gac ata aac ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Asp lie Asn Leu Pro Gly Lys Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gag cag gta ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Glu Gin Val Pro He 
50 55 60 

gaa ate tgt gga cat aaa act ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Thr He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg atg act cag att ggg tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Met Thr Gin lie Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aag aac agt act aga tgg aga aaa gta gta gat ttc 528 
Ala He Lys Lys Lys Asn Ser Thr Arg Trp Arg Lys Val Val Asp Phe 
165 170 175 
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aga gaa ctt aat aag aaa act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aac aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp . Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag acg cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ata tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ctg gtt ate tgt caa tac atg gat gat tta tat gta 864 
Gin Asn Pro Asp Leu Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac eta gaa ata ggg cag cat aga aca aaa ata gaa gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

agg caa cat ctg ttg aag tgg gga ttt acc aca cca gac gaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Glu Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag ccc ata gtg ctg cca gac aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Asp Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 93 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 
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<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 93 

cct cag ate act ctt tgg caa cga ccc ate gtc aca ata aag ata gga 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta ata gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu He Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat tta cca gga aga tgg aca cca aaa ata ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Thr Pro Lys He He Gly 
35 40 45 

gga att gga ggt ttt gtc aga gta aga cag tat gaa cag ata ccc gta 192 
Gly He Gly Gly Phe Val Arg Val Arg Gin Tyr Glu Gin He Pro Val 
50 55 60 

gaa ate tgc ggg cat aaa get gta ggt aca gta tta gta gga cct aca • 240 
Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag att ggc tgt act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gat act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro lie Ser Pro He Asp Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca ara gtt aaa caa tgg cca ttg aca gaa gag 384 
Pro Gly Met Asp Gly Pro Xaa Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ctg gaa aag gam gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Xaa Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aaa gta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Val Val Asp Phe 
165 170 175 

aga gaa ctt aat aaa aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 



180 185 190 

ata cca cat cct gca ggg ata maa aag aac aaa tea gta aca gta ytg 

He Pro His Pro Ala Gly He Xaa Lys Asn Lys Ser Val Thr Val Xaa 
195 200 ' 205 



624 
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gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gag gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tac act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 



att aga tat cag tac aat gta ctt cca cag gga tgg aaa gga tea cca 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



768 



gca ata ttc caa agt age atg aca aaa aty tta gag cct ttt aga aag 
Ala He Phe Gin Ser Ser Met Thr Lys Xaa Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



aaa aat cca gac ata rtt ate tgc caa tac atg gat gat ttg tat gta 
Lys Asn Pro Asp He Xaa He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 



864 



gga tct gac tta gaa ata gag cag cat aga aca aaa ata gat gaa ctg 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Asp Glu Leu 
290 295 300 



912 



aga gac cat ctg tgg aag tgg gga ttt tac aca cca gac aac aaa yat 
Arg Asp His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Asn Lys Xaa 
305 310 315 320 



960 



cag aaa gaa cct cca ttc cgt tgg atg ggc tat gaa etc cat cct gat 
Gin Lys Glu Pro Pro Phe Arg Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



1008 



aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gat age tgg act 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



1056 



gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 



1104 



aat tat gca gga 
Asn Tyr Ala Gly 
370 



1116 



<210> 94 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
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<400> 94 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
15 10 15 

ggg caa eta ata gag get eta ttg gat aca gga gca gat gat aca gta 96 
Gly Gin Leu lie Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg gat ttg cca gga aga tgg aaa cca aaa ata ata ggg 144 
Leu Glu Glu Met Asp Leu Pro Gly Arg Trp Lys Pro Lys lie lie Gly 
35 40 45 

gga att gga ggt tgg ate aaa gta aga caa tat gat cag ata ccc ata 192 
Gly lie Gly Gly Trp lie Lys Val Arg Gin Tyr Asp Gin lie Pro lie 
50 55 60 

gaa att tgt gga cat aaa gtt ata agt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Val He Ser Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cca gtc aac gta att gga aga aat ctg atg act cag att ggt tgc act 288 
Pro Val Asn Val He Gly Arg Asn Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aga gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aag ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gat ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Asp Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa gta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Val Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta ggg 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 190 

ata cca cat ccc gca ggg tta cca aag aaa aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Pro Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aaa tat act gca ttt ace ata cct agt ata aat aat gag aca cca gga 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 
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gtt aga tat cag tac aat gtg etc cca cag ggg tgg aaa gga tea cca 768 
Val Arg Tyr Gin Tyr Asn Val Leu- Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg ace aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

cag aat cca aac ata ctt att tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asn lie Leu He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gaa cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg aga tgg ggg ttt tac aca cca gat aaa aaa cat 960 
Arg Gin His Leu Trp Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aag gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Glu Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gat ata cag aag tta gtg gga aaa ttg aat tgg gca agy cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Xaa Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 



<210> 95 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 95 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 
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tta gaa gaa atg aat ttg cca gga agg tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 ~ 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata tec gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Ser Val 
50 55 * 60 

gaa ate tgt ggr cat aaa get ata ggt aca gta tta rta gga cct aca 240 
Glu He Cys Xaa His Lys Ala He Gly Thr Val Leu Xaa Gly Pro Thr 
65 70 75 * 80 



cct gtc aac ata att gga agg aat ttg ttg act cag att ggt tgc act 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 



85 90 95 



att aga tat caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

C OCft —> r- r- 



245 250 " 255 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 no 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 ~ " 175 

aga gaa ctt aat aag aaa act caa gac ttt tgg gar gtt caa tta gga 576 
Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac act gca ttt act ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 



768 



gca ata ttc cag tgt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 
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<221> CDS 

<222> (298) . . . (1096) 

<223> Portion of HIV Reverse Transcriptase 
<400> 101 

cct car ate act ctt tgg cag acc ccc ctt gtc yea ata agg aka ggg 48 
Pro Gin lie Thr Leu Trp Gin Thr Pro Leu Val Xaa lie Arg Xaa Gly 
15 10 15 

ggr cag yta aag gaa get tta tta gay aca gra gca gat gat mca gta 96 
Xaa Gin Xaa Lys Glu Ala Leu Leu Asp Thr Xaa Ala Asp Asp Xaa Val 
20 25 30 

tta gaa gaa atg tat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Tyr Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aag gta aga cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cac aaa get ata ggt aca gta ttg gta gga tct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Ser Thr 
65 70 75 80 

cct gtt aac ata att gga aga aat ctg ttg act cag att ggt tgc acc 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt tct att gaa act gta cca gta aga tta aag 336 
Leu Asn Phe Pro He Ser Ser He Glu Thr Val Pro Val Arg Leu Lys 
100 105 110 

ccc gga atg gat ggc cca aaa gtt aag caa tgg cca tta aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aag aac agt gat aga tgg aga aaa gta gta gat ttc 528 
Ala lie Lys Lys Lys Asn Ser Asp Arg Trp Arg Lys Val Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga acc caa gac ttt tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ' 185 190 

ata cca cat ccc gca ggg tta aaa agg aga aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Arg Arg Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tac ttt tea att ccc tta gat aaa gaa ttc aga 672 
Asp Val Gly Asp Ala Tyr Phe Ser He Pro Leu Asp Lys Glu Phe Arg 
210 215 220 
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caa aat cca gar ata gtt ate tat caa tac atg gat gat ctg tat gta 864 
Gin Asn Pro Glu lie Val lie Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gaa cag cat aga ata aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg He Lys He Glu Glu Leu 
290 295 300 

aga cac cat ctg ttg aaa tgg gga ttt wmc aca cca gac aaa aaa cat 960 
Arg His His Leu Leu Lys Trp Gly Phe Xaa Thr Pro Asp Lys Lys His 
305 310 315 * 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aar gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 * 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 
He Tyr Pro Gly 
370 



<210> 96 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 96 

cct caa ate act ctt tgg caa cga ccc aat gtc aca gta aag ata ggr 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Asn Val Thr Val Lys He Xaa 
1 5 10 15 

ggg caa eta agg gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa ata aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu He Asn Leu Pro Gly Arg Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att ggg ggt ttt ate aaa gta aga sag tat gat cag gta ccc gta 192 
Gly lie Gly Gly Phe lie Lys Val Arg Xaa Tyr Asp Gin Val Pro Val 
50 55 60 
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gaa ate tgt gga cat aaa get at a ggt aca gta tta gta gga ccc aca 240 

Glu He Cys Gly His Lys Ala lie Gly Thr Val Leu Val Gly Pro Thr 

65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 

Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 

85 ~ 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta ara tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Xaa Leu Lys 

100 105 110 

cca ggr atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 

Pro Xaa Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 

115 120 125 

aaa ata aaa gca tta gta gaa ate tgt aca gaa atg gaa aag gaa ggg 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 

130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca ata ttt 480 

Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe 

145 150 155 160 

gec ata aag aaa aaa gac ggt act aaa tgg aga aaa gta gta gat ttc 528 

Ala He Lys Lys Lys Asp Gly Thr Lys Trp Arg Lys Val Val Asp Phe 

165 170 175 

agg gaa etc aat aag aga act caa gac ttc tgg gaa gtt caa tta ggm 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Xaa 

180 185 190 

ata cca cat ccc gca ggg ttg aaa aag aaa aaa tea gtr aca gta ctg 624 

He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Xaa Thr Val Leu 

195 .200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gaa ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Glu Phe Arg 

210 215 220 

aag tat act gca ttt acc ata cct agt gta aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser Val Asn Asn Glu Thr Pro Gly 

225 230 235 240 

ate aga tat caa tac aat gtg ctt cca cag gga tgg aag gga tea cca 768 

lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 

245 250 255 



gca ata ttt caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



caa aat cca gac ata gtc ate tat caa tac gtg gat gat ttg tat gta 864 

Gin Asn Pro Asp lie Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys lie Glu Glu Leu 

290 295 300 
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aga caa cat ctg ttg aag tgg gga ttt acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 * 320 

cag aaa gag cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cgt ata gag ctg cca gaa aag gag age tgg act 1056 
Lys Trp Thr Val Gin Arg lie Glu Leu Pro Glu Lys Glu Ser Trp Thr 
340 345 " 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt caa 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

atw tac cca ggg 1116 
Xaa Tyr Pro Gly 
370 

<210> 97 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 97 

cct caa ate act ctt tgg caa cga ccc etc gtc aaa ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Lys He Lys He Gly 
1 5 10 15 

ggg caa ata aag gaa gey tta tta gat aca gga gca gat gat aca gtg 96 
Gly Gin He Lys Glu Xaa Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aaa tgg aaa cca aaa ttg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Leu He Gly 
35 40 * 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ctt ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu He 
50 55 60 

gaa ate tgt ggc cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 * 80 

cct gee aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ' 120 125 

aaa ata aaa gca tta eta gaa att tgt aca gaa ctg gaa aag gaa ggg 432 
Lys lie Lys Ala Leu Leu Glu lie Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttt tgg gag gtt caa eta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 190 

ata cca cat ccc gsa ggg tta aga aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Xaa Gly Leu Arg Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta tat gag gac tty agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Tyr Glu Asp Phe Arg 
210 " 215 220 

aaa tat act gca ttt act ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att agg tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate trt caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Xaa Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg cag tgg gga ttt ttc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Trp Gin Trp Gly Phe Phe Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 
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aaa tgg aca gta cag cct ata gta ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 98 
<211> 1115 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1115) 

<223> Portion of HIV Reverse Transcriptase 
<400> 98 

cct caa ate act ctt tgg caa cga ccc gtc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Val Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg cat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met His Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata cct gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro Val 
50 " 55 60 



gaa aty tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 
Glu Xaa Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 



240 



cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca ggg atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro" Leu Thr Glu Glu 
115 120 125 
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aaa ata aaa gca tta gta gaa ata tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cca gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 52 B 

Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa ttg gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca gga tta aaa aag aaa aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tat aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gay ata gtt att tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tec gac eta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga caa cac ctg ttg aag tgg ggr ttt acc ack cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Xaa Phe Thr Xaa Pro Asp Lys Lys His 
305 310 315 320 

cag aag gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 



aaa tgg aca gta cag cct ata gta ctg cca gaa aaa gat age tgg act 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 



1056 



gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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att tac tea gt 1115 
He Tyr Ser 
370 

<210> 99 
<211> 1115 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1115) 

<223> Portion of HIV Reverse Transcriptase 
<400> 99 

cct cag ate act ctt tgg cag cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get yta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga agr tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Xaa Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggc ttt ate aaa gta aga cag tat gat cag ata ccc eta 192 
Gly He Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin lie Pro Leu 
50 55 60 

gaa ate tgt ggc cat aag get ata ggt aca gta tta gta gga cct aca 24 0 

Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin lie Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cct gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggt cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 .125 

aaa ata aaa gca tta gta gaa att tgt aca gag atg gaa aag gaa ggg 432 
Lys lie Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 
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gcc ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 ** 190 

ata cca cat ccc tea ggg tta raa aag aag aaa tea gta aca gta ctg 624 
lie Pro His Pro Ser Gly Leu Xaa Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat cca gat ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Pro Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 " 230 235 240 

att agg tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa age age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tac caa tac dtg gat gat ttg tak gta 864 
Gin Asn Pro Glu lie Val lie Tyr Gin Tyr Xaa Asp Asp Leu Xaa Val 
275 280 285 

rgc tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Xaa Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aag cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtt cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca gg 1115 
He Tyr Ala 
370 



<210> 100 
<211> 1115 
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<212> DNA 

<213> Human Inununodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1115) 

<223> Portion of HIV Reverse Transcriptase 
<400> 100 

cct caa ate act ctt tgg caa cga ccc eta gtc aca ata aag ata gga 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
1 5 10 15 

ggg cag ctr aag gaa get ata tta gat aca gga gca gat gat aca kta 96 
Gly Gin Xaa Lys Glu Ala He Leu Asp Thr Gly Ala Asp Asp Thr Xaa 
20 25 30 

tta gaa gaa atg aat tng ccc gga aga tgg ama cca ama ttg ata ggg 144 
Leu Glu Glu Met Asn Xaa Pro Gly Arg Trp Xaa Pro Xaa Leu He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa gtt ata ggt aca gta ttg gta gga cct aca 240 
Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 * 80 

cct acc aac ata att gga aga aat ctg atg act cag ctt ggt tgc act 288 
Pro Thr Asn lie lie Gly Arg Asn Leu Met Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa ggg 4 32 

Lys lie Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 
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ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca ata ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr He Leu 
195 200 205 

gat gtg ggc gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 



aaa gta tac tgc ttt acc ata cct agt ata acc aat gag acm cca ggg 
Lys Val Tyr Cys Phe Thr He Pro Ser He Thr Asn Glu Xaa Pro Gly 
225 " ~ 230 235 240 



720 



912 



att aga tat cag tac aat gtg ctg cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag ccy ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Xaa Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg agg tgg gga ttt tac aca cca gac aaa aaa cat 960 
Arg Gin His Leu Trp Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aag gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata arg ttg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Xaa Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gam ata cag aaa tta gtg gga aaa tta aat tgg gec agt cag 1104 
Val Asn Xaa He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tck cng gg 1115 
He Xaa Xaa 
370 

<210> 101 
<211> 1096 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 
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aag tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

ate aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 * " 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga gaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Glu 
260 265 270 

cag aat cca gac atg gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp Met Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga tta ttc aca cca gac caa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Leu Phe Thr Pro Asp Gin Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat ccg gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag act ata gtg ctg cca gag aag gac age tgg act 1056 
Lys Trp Thr Val Gin Thr He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gta gga aaa ttg aat tgg g 1096 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp 
355 360 365 

<210> 102 
<211> 1048 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1048) 

<223> Portion of HIV Reverse Transcriptase 
<400> 102 

cct cag ate act ctt tgg cag cga ccc tty gtc aca ata aag gta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Phe Val Thr He Lys Val Gly 
1 5 10 15 

ggg caa eta aag gaa get eta ttg gat aca gga gca gat gat aca ata 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr lie 
20 25 30 
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tta gaa gaa atg tgt ttg cca gga aga tgg aaa cca aaa ttg ata ggg 144 
Leu Glu Glu Met Cys Leu Pro Gly Arg Trp Lys Pro Lys Leu lie Gly 
35 40 45 



gga att gga ggt ttt gtc aaa gta aga caa tat gat cag ata ccc ata 
Gly lie Gly Gly Phe Val Lys Val Arg Gin Tyr Asp Gin lie Pro He 
50 55 60 



192 



gaa ate tgt gga cat aaa gtt ata ggt aca gta tta gta gga cct aca 240 

Glu He Cys Gly His Lys Val He Gly Thr Val Leu Val Gly Pro Thr 

65 " 70 75 80 

cct gec aac ata gtt gga aga aat ctg ttg act cag att ggc tgt act 288 

Pro Ala Asn He Val Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 

Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 

100 105 110 

cca gga atg gat ggg cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 3 84 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 

115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gag aag gat gga 432 

Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Asp Gly 

130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tay aat act cca gta ttt 4 80 

Lys lie Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 150 155 160 

gec ata aag aaa aaa aat agt gat aaa tgg aga aaa gta gta gat ttc 528 

Ala He Lys Lys Lys Asn Ser Asp Lys Trp Arg Lys Val Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtc caa tta gga 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 

180 185 190 

ata cca cat ccc gga ggg tta rag aag aac aaa tea ata aca gta ctg 624 

He Pro His Pro Gly Gly Leu Xaa Lys Asn Lys Ser He Thr Val Leu 

195 200 205 

gat gtg ggt gat gca tat ttt tea att ccc tta gat aaa gac ttc aga 672 

Asp Val Gly Asp Ala Tyr Phe Ser lie Pro Leu Asp Lys Asp Phe Arg 

210 215 220 

aag tat act gca ttt ace ata ccy agt ata aac aat gag aca cca ggg 720 

Lys Tyr Thr Ala Phe Thr He Xaa Ser He Asn Asn Glu Thr Pro Gly 

225 230 235 240 

att aga tat cag tat aat gtg ctt cca cag gga tgg aag gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gec ata tbc caa agt age atg aca aaa ata tta gag cct ttt aga aag 816 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 

260 265 270 
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caa aat cca gac ata att ate gtt caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie lie lie Val Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gca tct gac tta gaa ata ggg cag cat aga aca aaa ata aag gaa eta 912 
Ala Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Lys Glu Leu 
290 295 300 



aga caa tat ctg tgg gag tgg gga ttt tac aca cca gac aaa aaa cat 
Arg Gin Tyr Leu Trp Glu Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 



960 



caa cag gaa ccc cca ttc etc tgg atg ggg tat gag etc cat cct gat 1008 
Gin Gin Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac a 1048 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp 
340 345 



<210> 103 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)...(297) 

<223> HIV Protease 



<221> CDS 

<222> (298) ... . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 103 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata arg rta ggg 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Xaa Xaa Gly 
15 10 15 



48 



ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 

Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 

Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 " 45 



96 



144 



gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ccc ata 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Pro He 
50 55 60 



192 



gaa ate tgt gga cat aaa get gaa ggt aca gta tta gta gga cct aca 
Glu He Cys Gly His Lys Ala Glu Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 - 80 



240 



ccg gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 
Pro Val Asn lie He Gly Arg Asn Leu Leu Thr Gin lie Gly Cys Thr 
85 90 95 



288 
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tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 

Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 

100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ctg aca gaa gaa 384 

Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta aba gaa att tgt aca gaa atg gaa aag gaa ggr 432 

Lys lie Lys Ala Leu Xaa Glu He Cys Thr Glu Met Glu Lys Glu Xaa 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act ccg gta ttt 480 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 

Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aaa act caa gac ttt tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Lys Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 

180 185 190 

ata cca cac ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 

He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gaa ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat aca gca ttt acc ata cct agt aca aac aat gag aca ccc agg 720 

Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Arg 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga teg cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 

Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 

260 265 270 

caa aat cca gac ata gtt ate tat caa tat gtg gat gat ttg tat gta 864 

Gin Asn Pro Asp He Val lie Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gag ata ggg cag cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga saa cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aaa cat 960 

Arg Xaa His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 
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aaa tgg aca gtr cag cct ata rag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Xaa Gin Pro He Xaa Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 * 350 

gtc aat gac ata cag aaa tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac gca gga 1116 
He Tyr Ala Gly 
370 

<210> 104 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 104 

cct cag ate act ctt tgg caa cga ccc mty gtc aca ata aag gta ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys Val Gly 
15 10 15 

ggg caa tta aaa gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

eta gaa gaa ata aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu He Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat car ata cyt ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Xaa He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttr act cag att ggc tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Xaa Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc ata agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 
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aaa ata aaa gca tta gya gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys lie Lys Ala Leu Xaa Glu lie Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 

Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 

145 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 

Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttt tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cca gca ggg eta cca agg aaa aga tea gta aca gta ctg 624 

lie Pro His Pro Ala Gly Leu Pro Arg Lys Arg Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat cca gac ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Pro Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt ata aac aat gag aca ccg ggg 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 

225 230 235 240 

att aga tat cag tac aat gta ctt cca cag gga tgg aaa gga tea cca 768 

lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gee ata ttc caa agt age atg aca aaa att tta gat cct ttt aga aaa 816 

Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Asp Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata att ate tat caa tac gtg gat gat ttg tat gta 864 

Gin Asn Pro Asp He He He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gca tct gac tta gaa ata ggg cag cac aga aca aaa ata gaa gaa eta 912 

Ala Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt tac aca cca gac aaa aaa cat 960 

Arg Gin His Leu Leu Arg Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 

305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 * 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 
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att tat gca ggg 1116 
lie Tyr Ala Gly 
370 

<210> 105 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 105 

cct cag ate act ctt tgg caa cga ccc ttc gtc gtc gta aag ata ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Phe Val Val Val Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat aat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asn Thr Val 
20 25 " 30 

ttt gaa gac ytg aat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Phe Glu Asp Xaa Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gta ctt gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Val Leu Val 
50 55 60 

gaa ate tgt gga caa aaa get ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly Gin Lys Ala He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga agg gat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn lie He Gly Arg Asp Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 * 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aar att tea aaa att ggg cct gaa aac cca tac aat act cca gta ttt 480 
Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 
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gcc ata aag aaa aaa gac agt act aar tgg aga aaa tta gta gat ttc 528 

Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 

Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 

180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 

lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 

195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gay ttc agg 672 

Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct age ata aac aat gag aca cca gga 720 

Lys Tyr Thr Ala Phe Thr lie Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 

He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tgt age atg aca aaa ate tta gat cct ttt aga aag 816 

Ala He Phe Gin Cys Ser Met Thr Lys He Leu Asp Pro Phe Arg Lys 

260 265 270 

caa aat cca gac eta gtt ate tat caa tac rtg gat gac ttg tat gta 864 

Gin Asn Pro Asp Leu Val He Tyr Gin Tyr Xaa Asp Asp Leu Tyr Val 

275 280 285 

gga tct gat tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 

Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga car cat ctg ttg aag tgg gga ttt acc aca cca gac aaa aar cat 960 

Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 

Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 

Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 

340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 

Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 

355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 



<210> 106 
<2H> 1116 
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<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)...(297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 106 

cct cag ate act ctt ngg caa cga ccm att gtc aca ata aag gta ggg 48 
Pro Gin He Thr Leu Xaa Gin Arg Xaa He Val Thr He Lys Val Gly 
1 5 10 15 

ggg cam tta aaa gaa gtt ytt tta gat mma gga gca gat gat cma gta 96 
Gly Xaa Leu Lys Glu Val Xaa Leu Asp Xaa Gly Ala Asp Asp Xaa Val 
20 25 30 

tta gaa gaa atr gat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Xaa Asp Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat caa ata gtt gta 192 
Gly He Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin He Val Val 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gag gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca ttg gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 * 140 

aaa att tea aaa aty ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys Xaa Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

agg gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 
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ata cca cat ccc gca ggg yta aaa aag aac aaa tea gta aca gta ctg 624 
lie Pro His Pro Ala Gly Xaa Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttc tea gtt ccc tta gat aaa gac ttt agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 " ' 215 220 

aag tat act gca ttt acc ata ccc agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tat aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate eta gag cct ttt agg aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg .Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gaa gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga gca cat ctg tta aag tgg gga ttt acc aca cca gay aaa aag cat 960 
Arg Ala His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gtg cag cct ata aag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Lys Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gat ata cag aag tta gtg gga aaa ttg aat tgg gee agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca gga 1116 
He Tyr Pro Gly 
370 

<210> 107 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 
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<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 107 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get tta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg gaa ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Glu Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 " 45 

gga att gga ggt ttt ate aaa gta agm cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Xaa Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa att tgt gga cat aaa get gtg ggt aca gta tta gta gga cct aca 24 0 

Glu He Cys Gly His Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 " 80 

cct gtc aac ata att gga aga aat ctg ttg act aag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Lys He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att gga cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 ~ 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa mgg aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Xaa Lys Lys Ser Val Thr Val Leu 
195 200 ° 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gag ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 
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aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca gga 720 
Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg yyt cca cag gga tgg aaa gga tea cca 768 
lie Arg Tyr Gin Tyr Asn Val Xaa Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tat cag tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu lie Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cac aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aag tgg gga ttt acc aca cca gac aaa aaa cat 960 
Arg Glii His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa ccc cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg eta cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gcg agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tay gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 108 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 108 

cct caa ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 
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ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gtg 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca ggg aaa tgg aag cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met lie Gly 
35 40 ~ - 45 

gga att gga ggg ttt ate aaa gta agm erg tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe He Lys Val Xaa Xaa Tyr Asp Gin He Pro He 
50 55 ' 60 

gaa ate tgt gra cat aaa get aya ggt aca gta tta ata ggm cct act 240 
Glu He Cys Xaa His Lys Ala Xaa Gly Thr Val Leu He Xaa Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga awt ctg atg act cag att ggg tgc act 288 
Pro Val Asn He He Gly Arg Xaa Leu Met Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gag 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 4 80 

Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

get ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 ~ 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggt tta aaa aag aaa aaa tea gta aca gta eta 624 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggg gat gca tat ttt tea gtt ccc tta gat gaa aac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asn Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gta ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 " 255 
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gca ata ttc caa tgt age atg aca aaa ate tta gag cct ttc aga aag 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa atg gtt ate trc caa tac gtg gat gay ttg tat gta 864 
Gin Asn Pro Glu Met Val He Xaa Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

ggt tct gac tta gaa ata ggg cag cat aga gca aaa ata gag gaa ctr 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Ala Lys He Glu Glu Xaa 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt ace aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 * 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa ctm cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Xaa His Pro Asp 
325 330 335 

aaa tgg aca gtg cag cat ata gaa ctg cca gaa caa gag age tgg act 1056 
Lys Trp Thr Val Gin His He Glu Leu Pro Glu Gin Glu Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa yta aat tgg gca agy cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Xaa Asn Trp Ala Xaa Gin 
355 360 * 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 109 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 109 

cct caa ate act ctt tgg caa cga ccc ate gtc aca gta aag ata gag 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr Val Lys He Glu 
15 10 15 

ggg cag eta aag gaa get yta tta gat aca gga gca gat aat aca gta 96 
Gly Gin Leu Lys Glu Ala Xaa Leu Asp Thr Gly Ala Asp Asn Thr Val 
20 25 30 

ttg gam gaa ata aat ttg cca gga aga tgg aaa cca aaa atg ata ggg 144 
Leu Xaa Glu He Asn Leu Pro Gly Arg Trp Lys Pro Lys Met He Gly 
35 40 45 
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gga att gra ggt ttt ate aaa gta aam cag tat gat sag ata mcc ata 
Gly lie Xaa Gly Phe He Lys Val Xaa Gin Tyr Asp Xaa He Xaa He 
50 55 60 



192 



gac ate tgt gga cat aaa gta ata ggt aca ata tta gta gga cct aca 
Asp He Cys Gly His Lys Val He Gly Thr He Leu Val Gly Pro Thr 
65 70 75 80 



240 



cct gtc aac ata att gga aga gat ctg ttg act cag att ggc tgc act 
Pro Val Asn He He Gly Arg Asp Leu Leu Thr Gin lie Gly Cys Thr 
85 90 95 



288 



tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 
Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 



336 



cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gar gaa 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 



384 



aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa gga 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 



432 



aag att tea aaa att ggg cct gaa aat cca tac aac act cca gta ttt 
Lys He Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 



480 



gec ata aag aaa aag gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 * 175 



528 



aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 ~ 185 190 



576 



ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 
lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 



624 



gat gtg ggt gat gca tat tty tea gtt ccc tta gmt aaa gaa tnn nnn 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Xaa Lys Glu Xaa Xaa 
210 * ~ 215 220 



672 



nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn nnn 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
225 230 235 240 



720 



nnn nnn nnn nnn nnn nnn nnn nnn cca cag gga tgg aaa gga tea cca 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 



768 



gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 
Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 



816 



caa aat cca gaa ata gtt ate tac car tac rtg gat gay ttg ttw gta 
Gin Asn Pro Glu lie Val lie Tyr Gin Tyr Xaa Asp Asp Leu Xaa Val 
275 280 *" 285 



864 
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gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu lie Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt acc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggy tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Xaa Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 110 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 110 

cyt cag ate act ctt tgg caa cga ccc cts gtc aca ata aag gta ggg 48 
Xaa Gin lie Thr Leu Trp Gin Arg Pro Xaa Val Thr He Lys Val Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atr aat ttg cca ggr aaa tgg aaa cca awa atg ata ggg 144 
Leu Glu Glu Xaa Asn Leu Pro Xaa Lys Trp Lys Pro Xaa Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata etc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu He 
50 * 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala lie Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 
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cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn lie lie Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca ggg atg gat ggc cca aaa gtt aag caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta ata gaa ate tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 " 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gtg aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 " 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tac act gca ttt mcc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Xaa He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 ~ 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa mat cca gac atg gty ate tat caa tac atg gat gat ttg tat gta 864 
Gin Xaa Pro Asp Met Xaa He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggr cag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Xaa Gin His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga cag cat ttg ttg aag tgg gga ttt ace aca cca gac aaa aag cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro" Asp Lys Lys His 
305 310 315 320 
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cag aaa gag cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gag ctg cca gaa aar gam age tgg act 1056 
Lys Trp Thr Val Gin Pro lie Glu Leu Pro Glu Lys Xaa Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aaa ata gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys He Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 111 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 111 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa ata aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin He Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg age ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Ser Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta agm cag tat gwt cat ata ccc ata 192 
Gly He Gly Gly Phe He Lys Val Xaa Gin Tyr Xaa His He Pro He 
50 55 60 

gaa wtc tgt ggm cat aaa get gaa ggt aca gta tta ata gga cct aca 240 
Glu Xaa Cys Xaa His Lys Ala Glu Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He lie Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc ata agt cct att gaa act gta cca gta aga eta aaa 336 
Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro Val Arg Leu Lys 
100 105 110 
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cca gga atg gat ggg cca aaa gtt aag caa tgg cca eta aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ate aaa gca ttg ata gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu He Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att gaa aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Glu Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata agg aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala lie Arg Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttt tgg gaa att caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu He Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt gta aat aat gag aca cca gga 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat caa tac aat gtg ctt cca caa gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 ' 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa yta gtt ate tac caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu Xaa Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tea gac tta gaa ata gar aag cat aga gca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Lys His Arg Ala Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg tya aaa tgg ggg ttt acc aca cca gac aaa aaa cat 960 
Arg Glu His Leu Xaa Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 * 320 

cag aaa gaa cct cca ttt ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag acc ata aag ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Thr He Lys Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 " 350 
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gtc aat gat ata cag aag tta gtg gga aaa ttg aat tgg gca agt caa 1104 
Val Asn Asp lie Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 * 360 365 

att tat cca ggg 1116 
lie Tyr Pro Gly 
370 

<210> 112 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . , . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 112 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aga tgg aaa cca aaa atk ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Arg Trp Lys Pro Lys Xaa He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata ctt gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu Val 
50 55 60 

gaa att tgt gga cat aaa get ata ggt aca gta tta gta gga cct aca 240 
Glu He Cys Gly His Lys Ala lie Gly Thr Val Leu Val Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gag act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtc aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta atg gaa att tgt gca gaa wtg gaa aag gaa gga 432 
Lys He Lys Ala Leu Met Glu He Cys Ala Glu Xaa Glu Lys Glu Gly 
130 135 140 
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aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gec ata aag aaa aaa gac age act aaa tgg ara aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Xaa Lys Leu Val Asp Phe 
165 170 175 



aga gaa ctt aat aar aga act caa gac ttt tgg gaa gtt caa tta gga 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 ~ 190 



576 



ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta eta 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aag tat act gca ttt ace ata cct agt ata aac aat gag acm cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Xaa Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 " 255 

gca ata ttc caa agt age atg aca aaa att tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tat caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga cag cat ctg ttg aag tgg gga ttk tmc aca cca gac aaa aaa cat 960 
Arg Gin His Leu Leu Lys Trp Gly Xaa Xaa Thr Pro Asp Lys Lys His 
305 310 315 * 320 

cag aaa saa cct cca ttc ctt tgg atg ggt tat gaa etc cmt cct gat 1008 
Gin Lys Xaa Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu Xaa Pro Asp 
325 330 335 

aaa tgg aca gta caa cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 " 350 

gtc aat gac ata cag aag tta gtg gga aaa ttr aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Xaa Asn Trp Ala Ser Gin 
355 360 365 

att tac gca ggg 1116 
He Tyr Ala Gly 
370 
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<210> 113 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 113 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 4 8 

Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys He Gly 
15 10 15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg aat ttg cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Asn Leu Pro Gly Lys Trp Lys Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag ata etc ata 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin He Leu He 
50 " 55 60 

gaa ate tgt gga cat aaa act ata ggt aca gta tta ata gga cct aca 240 
Glu He Cys Gly His Lys Thr He Gly Thr Val Leu He Gly Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag ctt ggt tgt act 288 
Pro Val Asn He lie Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggt cca aga gtt aaa caa tgg cca ttg acm gaa gaa 384 
Pro Gly Met Asp Gly Pro Arg Val Lys Gin Trp Pro Leu Xaa Glu Glu 
115 120 125 

aaa ata aaa gca tta ata gaa ate tgc aca gaa atg gaa aag gam sga 432 
Lys lie Lys Ala Leu He Glu lie Cys Thr Glu Met Glu Lys Xaa Xaa 
130 135 140 

waa att tea aaa mta ggg cct gam wat cca tac aat act cca gta ttt 480 
Xaa lie Ser Lys Xaa Gly Pro Xaa Xaa Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 



gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 
Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 



528 
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aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cac ccg gca ggg tta aaa aag aac aaa tea gta aca gtg ttg 624 
lie Pro His Pro Ala Gly Leu Lys Lys Asn Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gag ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Glu Phe Arg 
210 215 220 

aag tat act gca ttt acc ata cct agt ata aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

ate aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tst age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Xaa Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gaa ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Glu He Val He Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ttg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga gaa cat ctg ttg aag tgg gga ttt acc aca cca gat aaa aaa cat 960 
Arg Glu His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gag etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat gca ggg 1116 
He Tyr Ala Gly 
370 

<210> 114 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 



<220> 
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<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 114 

cmt caa atm amt ctt tgg car mra ccc eta gtc cna awn nmm gkk agg 48 
Xaa Gin Xaa Xaa Leu Trp Gin Xaa Pro Leu Val Xaa Xaa Xaa Xaa Arg 
15 10 15 

ggg gca aat aag gaa get eta tta gac aca gga gca gat gat raca gta 96 
Gly Ala Asn Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Xaa Val 
20 25 30 

tta gaa gaa atg wat tta cca gga aaa tgg aaa cca aaa atg ata ggg 144 
Leu Glu Glu Met Xaa Leu Pro Gly Lys Trp Lys Pro Lys Met lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta agn cag tat gag cag ata ccc ata 192 
Gly He Gly Gly Phe lie Lys Val Xaa Gin Tyr Glu Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta ttg gta ggm cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Xaa Pro Thr 
65 70 75 80 

cct gtc aac ata att gga aga aat ctg ttg act cag att ggt tgc act 288 
Pro Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gtg aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 HO 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca tta aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 ~ 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa aaa gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtc caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gtg ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 
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gac gtg ggt gat gca tat ttt tea gtt ccc tta gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tat act gca ttt tcy ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Xaa lie Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

agt agg tat caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
Ser Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 * 255 

gca ata ttc caa agt age atg ata aaa ate tta gag cct ttt aga aaa 816 
Ala lie Phe Gin Ser Ser Met .lie Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca raa att gtg ate tat cma tac mtg gat gat ttg tat gta 864 
Gin Asn Pro Xaa He Val He Tyr Xaa Tyr Xaa Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata gaa cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Glu Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg agg tgg gga ttt acc aca cca gac aag aaa cat 960 
Arg Gin His Leu Leu Arg Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aar gaa cct ccg ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac ags ttg ret 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Xaa Leu Xaa 
340 345 350 

kca aat gac ata cag aag tta gtg gga aaa ttg aat tgg gca agt cag 1104 
Xaa Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac tea ggg 1116 
He Tyr Ser Gly 
370 



<210> 115 

<211> 1116 

<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
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<400> 115 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 4 8 

Pro Gin lie Thr Leu Trp Gin Arg Pro Leu Val Thr lie Lys lie Gly 
1 5 10 15 

ggg cag eta aag gaa get eta ata gat aca gga gca gat gat aca gtg 96 
Gly Gin Leu Lys Glu Ala Leu lie Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg agt ata cca gga aaa tgg aaa cca aaa ttg ata ggg 144 
Leu Glu Glu Met Ser lie Pro Gly Lys Trp Lys Pro Lys Leu lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aga cag tat gat cag gkg ccc gta 192 
Gly He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Gin Xaa Pro Val 
50 * 55 60 

gaa att tgt gga cat aaa get ata ggt mca gtw tta ata ggm cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Xaa Xaa Leu He Xaa Pro Thr 
65 70 75 80 

cct gee aac ata att gga agg aat ctg ttg act cag att ggt tgc act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aag caa tgg cca ttg aca gaa gag 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta aca gaa att tgt aca gaa atg gaa aag gaa gga 432 
Lys He Lys Ala Leu Thr Glu He Cys Thr Glu Met Glu Lys Glu Gly 
130 135 140 

aag att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gee ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gat ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa gac ttt agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu Asp Phe Arg 
210 215 220 

aaa tat act gca ttt ace ata cct agt gta aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Val Asn Asn Glu Thr Pro Gly 
225 230 235 240 
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att aga tat cag tat aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa tgt agt atg aca aaa ata tta gag ccc ttt aga aaa 816 
Ala He Phe Gin Cys Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 

caa aat cca gac eta gtt ate tat caa tac gtg gat gat ttg tat gta 864 
Gin Asn Pro Asp Leu Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg ttg aaa tgg ggt ttt ace aca cca gac aaa aag cat 960 
Arg Gin His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cca gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 " 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 116 
<211> 1116 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0)...(297) 

<223> HIV Protease 



<221> CDS 

<222> (298) . . . (1116) 

<223> Portion of HIV Reverse Transcriptase 
<400> 116 

cct cag ate act ctt tgg caa cga ccc etc gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro Leu Val Thr He Lys lie Gly 
15 10 15 

ggg cag eta aag gaa get eta tta gat aca gga gca gat gac aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 * 30 
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tta gaa gaa ata agt ctg cca gga aga tgg aaa cca aaa ttg ata ggg 144 
Leu Glu Glu lie Ser Leu Pro Gly Arg Trp Lys Pro Lys Leu lie Gly 
35 40 45 

gga att gga ggt ttt ate aaa gta aag cag tat gat cag ata ccc ata 192 
Gly lie Gly Gly Phe lie Lys Val Lys Gin Tyr Asp Gin He Pro He 
50 55 60 

gaa ate tgt gga cat aaa get ata ggt aca gta tta gta ggm cct aca 240 
Glu He Cys Gly His Lys Ala He Gly Thr Val Leu Val Xaa Pro Thr 
65 70 75 80 

cct gtc aac ata gtt gga aga aat ctg ttg act cag ctt ggt tgc act 288 
Pro Val Asn He Val Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aag gtt aag caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 " 120 ^ 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa ggg 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gta ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 150 155 160 

gee ata aag aaa aaa gac agt aca aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttt tgg gaa gtt caa eta ggg 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc ttg gat aaa gac ttc agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aag tac act gca ttt acc ata cct agt ata aat aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat caa tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa agt age atg aca aaa ate tta gag cct ttt aga aaa 816 
Ala He Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Arg Lys 
260 265 270 
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caa aat cca gac ata gtt ate tat caa tac gta gat gac ttg tat gta 864 
Gin Asn Pro Asp He Val He Tyr Gin Tyr Val Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys He Glu Glu Leu 
290 295 300 

aga caa cat ctg tgg aag tgg ggg ttt tac aca cca gat aaa aaa cat 960 
Arg Gin His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 320 

cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aag gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa tta aat tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tac cca ggg 1116 
He Tyr Pro Gly 
370 

<210> 117 
<211> 1119 
<212> DNA 

<213> Human Immunodif iciency Virus (HIV) 

<220> 

<221> CDS 

<222> (0) . . . (297) 

<223> HIV Protease 

<221> CDS 

<222> (298) . . . (1119) 

<223> Portion of HIV Reverse Transcriptase 
<400> 117 

cct caa ate act ctt tgg caa cga ccc ate gtc aca ata aag ata ggg 48 
Pro Gin He Thr Leu Trp Gin Arg Pro He Val Thr He Lys He Gly 
1 5 10 ^15 

ggg caa eta aag gaa get eta tta gat aca gga gca gat gat aca gta 96 
Gly Gin Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val 
20 25 30 

tta gaa gaa atg gat ttg cca gga aga tgg aca cca aaa atg ata ggg 144 
Leu Glu Glu Met Asp Leu Pro Gly Arg Trp Thr Pro Lys Met He Gly 
35 40 45 

gga att gga ggt ctt gtc aaa gta aga cag tat gat cag ata ccc ata 192 
Gly He Gly Gly Leu Val Lys Val Arg Gin Tyr Asp Gin lie Pro He 
50 55 60 
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gaa ate tgt gga cat aaa act ata ggt aca gta tta gta gga cct aca 24 0 

Glu He Cys Gly His Lys Thr He Gly Thr Val Leu Val Gly Pro Thr 
65 * 70 75 80 

cct gec aac ata att gga aga aat ctg ttg act cag ctt ggt tgt act 288 
Pro Ala Asn He He Gly Arg Asn Leu Leu Thr Gin Leu Gly Cys Thr 
85 90 95 

tta aat ttt ccc att agt cct att gaa act gta cca gta aaa tta aag 336 
Leu Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys 
100 105 110 

cca gga atg gat ggc cca aaa gtt aaa caa tgg cca ttg aca gaa gaa 384 
Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Glu Glu 
115 120 125 

aaa ata aaa gca tta gta gaa att tgt aca gaa ttg gaa aag gaa gga 432 
Lys He Lys Ala Leu Val Glu He Cys Thr Glu Leu Glu Lys Glu Gly 
130 135 140 

aaa att tea aaa att ggg cct gaa aat cca tac aat act cca gtg ttt 480 
Lys He Ser Lys He Gly Pro Glu Asn Pro Tyr Asn Thr Pro Val Phe 
145 * 150 155 160 

gec ata aag aaa aaa gac agt act aaa tgg aga aaa tta gta gat ttc 528 
Ala He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe 
165 170 175 

aga gaa ctt aat aag aga act caa gac ttc tgg gaa gtt caa tta gga 576 
Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly 
180 185 190 

ata cca cat cct gca gga tta aaa aag aaa aaa tea gta aca gta ctg 624 
He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val Thr Val Leu 
195 " 200 205 

gat gtg ggt gat gca tat ttt tea gtt ccc tta gac aag gac ttt agg 672 
Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Lys Asp Phe Arg 
210 215 220 

aaa tat act gca ttt acc ata cct agt aca aac aat gag aca cca ggg 720 
Lys Tyr Thr Ala Phe Thr He Pro Ser Thr Asn Asn Glu Thr Pro Gly 
225 230 235 240 

att aga tat cag tac aat gtg ctt cca cag gga tgg aaa gga tea cca 768 
He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro 
245 250 255 

gca ata ttc caa age age atg aca aaa ate tta gat cct ttt aga aag 816 
Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Asp Pro Phe Arg Lys 
260 265 270 

caa aat cca gac ata gtt ate tgt caa tac atg gat gat ttg tat gta 864 
Gin Asn Pro Asp He Val lie Cys Gin Tyr Met Asp Asp Leu Tyr Val 
275 280 285 

gga tct gac tta gaa ata ggg cag cat aga aca aaa ata gag gaa ctg 912 
Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys lie Glu Glu Leu 
290 295 300 
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aga gaa cat ctg tgg aag tgg ggg ttt tac aca cca gac aaa aaa cat 960 
Arg Glu His Leu Trp Lys Trp Gly Phe Tyr Thr Pro Asp Lys Lys His 
305 310 315 * * 320 

cag aaa gaa cct ccg ttc etc tgg atg ggt tat gaa etc cat cct gat 1008 
Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp 
325 330 335 

aaa tgg aca gta cag cct ata gtg ctg cca gaa aaa gac age tgg act 1056 
Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys Asp Ser Trp Thr 
340 345 350 

gtc aat gac ata cag aag tta gtg gga aaa ttg aac tgg gca agt cag 1104 
Val Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin 
355 360 365 

att tat yea ggg att 1119 
He Tyr Xaa Gly He 
370 
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visualization, computer functionality for protein sequence and structural 
analyses and database searching tools. The databases may further 
include observed clinical data associated with the genetic polymorphism. 
The databases provide a means to design the allele-specific drugs and 
5 also to identify among alleles common or conserved structural features 
that can serve as the target for drug design. 

The databases 'c^^so^M^seJ fofJi§ehtifc§tibn of invariant 
residues and regions of a target biomoleucle, such as an HIV protease or 
reverse transcriptase. The identified invariant regions are then used to 

10 computationally screen compounds, preferably small molecules by 
assessing binding interactions. The compounds so-identified serve as 
candidates for drugs that will be effective for a larger proporation of a 
population or against a broader range of variants of a pathogen, where 
the target protein is from a pathogens. 

15 Systems, including computers, containing the databases also are 

provided herein. Any computer known to those of skill in the art for 
maintaining such databases is contemplated. User interfaces for 
accessing and manipulating the databases and content thereof are also 
provided. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates a method for creating a protein structural variant 
relational database. 

FIG. 2 is a flow chart that describes one method used to generate 
structural variant models derived from genetic polymorphisms and to use 
25 the models in structure-based drug design studies. 

FIG. 3 is a flow chart that describes an alternative method used to 
generate structural variant models derived from genetic polymorphisms 
and to use the models in structure-based drug design studies. 
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